Knowledge Base
📝 Context Summary
Context Management for Deep Agents
As AI agents tackle increasingly complex and long-running tasks, effective context management is critical to prevent context rot and operate within the finite memory constraints of LLMs. The Deep Agents SDK by LangChain provides a built-in agent harness with features designed to facilitate context compression.
Context compression refers to techniques that reduce the volume of information in an agent’s working memory while preserving task-relevant details. Deep Agents implements a filesystem abstraction that allows agents to offload and retrieve information as needed.
Core Compression Techniques
The Deep Agents SDK triggers three main compression techniques at different thresholds of the model’s context window.
1. Offloading Large Tool Results
When a tool invocation returns a large response (e.g., reading a large file), the SDK automatically offloads the full response to the filesystem. In the agent’s active context, the response is replaced with a file path reference and a brief preview. The agent can then use filesystem tools like read_file or search to access the full content if needed.
2. Offloading Large Tool Inputs
File write and edit operations can leave large, redundant data in the agent’s history. As the context window fills (e.g., crosses an 85% threshold), the SDK truncates older tool calls containing this data, replacing them with a pointer to the file on disk. This prunes redundant information that is already persisted elsewhere.
3. Summarization
When offloading techniques are insufficient to create space, the SDK falls back to summarization. This is a dual-component process:
– In-Context Summary: An LLM generates a structured summary of the conversation, including the session’s intent, artifacts created, and next steps. This summary replaces the full message history in the agent’s working memory.
– Filesystem Preservation: The complete, original conversation history is written to the filesystem as a canonical record.
This approach ensures the agent maintains high-level awareness of its goals while retaining the ability to recover specific details by searching the filesystem.
Evaluation and Best Practices
Verifying that context compression works correctly is crucial. Instead of relying on broad benchmarks where compression events are sporadic, it is more effective to use targeted strategies.
1. Stress-Testing and Targeted Evals
- Aggressive Triggering: Artificially lower the compression threshold (e.g., to 20% of the context window instead of 85%). This forces more frequent compression events, making it easier to isolate their impact and compare different configurations (like summarization prompt variations).
- Targeted Evaluations: Use small, specific tests to validate individual mechanisms. For example, a “needle-in-the-haystack” test involves embedding a key fact early in a conversation, forcing a summarization event, and then requiring the agent to recall that fact later. Success proves the agent can recover information from the filesystem.
2. Key Evaluation Criteria
When evaluating your compression strategies, focus on three areas:
– Baseline Performance: First, run the agent on representative tasks to establish a baseline.
– Information Recoverability: Ensure that critical information remains accessible after being offloaded or summarized away.
– Goal Drift Monitoring: Watch for failures where the agent loses track of the user’s original intent after a summarization event. This is the most insidious failure mode and can be surfaced by forcing frequent summarization.
By combining these compression techniques with rigorous, targeted evaluation, you can build robust agents capable of handling complex, long-running tasks without succumbing to context rot.