Vector embedding
Embedding
A dense numeric vector that represents text (or images, audio…) in a learned semantic space. Cosine-similar vectors mean semantically similar content. The thing underneath every RAG pipeline.
An embedding model maps a chunk of input to a fixed-dimensional vector (typically 384, 768, 1536, or 3072 dimensions). The training objective is set up so that semantically similar inputs end up close in the space — "how to fix a leaky tap" and "plumbing repair guide" should be neighbours; "how to fix a leaky tap" and "Mongolian throat singing" should not.
Production stacks usually use a hosted embedding model (OpenAI text-embedding-3, Cohere embed-v3, Voyage AI) or an open one (BGE, E5, GTE) plus a vector database (Pinecone, Weaviate, Qdrant, pgvector, Cloudflare Vectorize). The choice of embedding model matters more than the choice of DB — the DB just stores and searches what the model produced.
See also
-
RAG
Retrieval-Augmented GenerationLooking up relevant context from an external store (vector DB, docs, your own corpus) and stuffing it into the LLM prompt before answering. Reduces hallucination, costs less than fine-tuning, but adds a retrieval failure mode of its own.
-
Context window
Context windowThe maximum number of tokens an LLM can consider per forward pass. 2026 frontier: 1M+ for some models (Claude 4.7, Gemini 2.5 Pro). Bigger window ≠ better answer — recall degrades inside long contexts.