Loading...
Preparing your experience
Preparing your experience
How to manage context windows for agents that run for hours or days
The hardest problem in building AI agents isn't the model—it's memory. When agents run for hours or days, context windows fill up. Most teams handle this poorly and wonder why their agents drift off task.
A typical agent has:
With an 8K context window, you run out of space in ~20 turns. With a 128K window (like GPT-4) you have more room but retrieval quality degrades.
What the agent needs right now. Keep this under 4K tokens.
Eviction policy: FIFO after 10 turns, keep task context pinned
What the agent might need soon. Store in fast retrieval (Redis, Pinecone, or Weaviate).
Eviction policy: LRU after 1 hour, compress before eviction
What the agent rarely needs but shouldn't forget. Store in durable storage.
Eviction policy: Never delete, but compress aggressively
This is why I built r3. It implements three-tier memory with:
Learn more in the Agent Memory Benchmark.
For an 8K context window:
For a 128K context window:
At Google, we built ML deployment agents that ran for 6-8 hours:
Result: Agents stayed on task for full deployment cycles, with <2% drift rate.
Track these metrics:
Target: Half-life > 4 hours, P@5 > 90%, Utilization 60-80%, Drift < 5%
I work with teams to implement these frameworks in production AI systems.