AI Week Radar

Context window

Context window

The maximum number of tokens an LLM can consider per forward pass. 2026 frontier: 1M+ for some models (Claude 4.7, Gemini 2.5 Pro). Bigger window ≠ better answer — recall degrades inside long contexts.

The context window is the input + output budget for a single LLM call, counted in tokens (~0.75 words for English). When the spec says "200k context", it includes the system prompt, conversation history, any retrieved chunks, AND the room for the model's response.

Bigger windows enable longer documents, deeper agent histories, and "fit-your-whole-codebase" workflows. But: recall is non-uniform across the window. Most models recall content from the start and end better than the middle ("lost-in-the-middle"). Above ~100k tokens, recall on specific facts often drops sharply.

Practical answer: RAG is still relevant even at 1M context. Don't replace retrieval with brute-force context-stuffing; combine them.

See also