/ glossary
AI engineering
in plain English.
Short, opinionated definitions of the terms we use most often. Skip the textbook — these are written for people who ship.
-
RAG
Retrieval-Augmented GenerationLooking up relevant context from an external store (vector DB, docs, your own corpus) and stuffing it into the LLM prompt before answering. Reduces hallucination, costs less than fine-tuning, but adds a retrieval failure mode of its own.
-
MoE
Mixture of ExpertsA model architecture where each forward pass activates only a fraction of total parameters via a learned router. Mixtral, DeepSeek-V3, and Llama 4 use it. Bigger total parameter count, similar compute per token.
-
MCP
Model Context ProtocolOpen standard from Anthropic (Nov 2024) for connecting AI assistants to external tools, data sources, and prompts. Think "USB for LLM tools" — one protocol, many servers, any client.
-
Agentic
Agentic systemsLLM-driven loops that plan, take actions in the world (call tools, edit files, hit APIs), observe results, and iterate — rather than just answering a single prompt. The dominant 2026 paradigm for AI engineering.
-
Eval
EvaluationSystematic measurement of LLM/agent quality — accuracy, hallucination rate, latency, cost. The discipline you wish you'd started 6 months earlier. Without it, you're shipping vibes.
-
Embedding
Vector embeddingA dense numeric vector that represents text (or images, audio…) in a learned semantic space. Cosine-similar vectors mean semantically similar content. The thing underneath every RAG pipeline.
-
Context window
Context windowThe maximum number of tokens an LLM can consider per forward pass. 2026 frontier: 1M+ for some models (Claude 4.7, Gemini 2.5 Pro). Bigger window ≠ better answer — recall degrades inside long contexts.
-
Tool use
Tool use / function callingThe LLM emits a structured request to call an external function (search, calculator, API), the host runs it, the result goes back in the next turn. Foundation of every agent worth shipping.