RAG (Retrieval-Augmented Generation) — AI Week Radar glossary

RAG is the dominant production pattern for grounding LLMs in domain-specific knowledge. Instead of fine-tuning a model on your data — expensive, slow, and freezes the model on a snapshot — you keep the data in a retrieval layer (typically vector embeddings over chunks of text) and at query time fetch the top-k relevant chunks to drop into the prompt.

The wins: cheaper than fine-tuning, updates instantly when your corpus changes, and the model can cite which chunks it used. The losses: retrieval quality becomes its own engineering problem (chunking strategy, embedding choice, reranking, eval), and the context window puts a hard ceiling on how much you can stuff in.

Mature production RAG stacks include a query rewriter, a reranker, citation tracking, and an eval loop — not just the dense vector lookup.

RAG

See also

Embedding

Context window

Eval