Keyword search fails when users phrase things differently than your content. Semantic search uses embeddings to match by meaning — 'how do I get my money back' finds your 'refund policy' page even with no shared words.
The Pipeline
- Embed all your content once and store the vectors.
- At query time, embed the search query.
- Find the nearest content vectors and return them ranked by similarity.
Hybrid Search Wins
Pure semantic search can miss exact terms (product codes, names). The best results combine keyword (BM25) and vector search, then merge the rankings. You get exact-match precision plus semantic recall.
Re-Ranking
For the final polish, pass the top ~20 candidates through a cross-encoder re-ranker that scores each against the query directly. It's slower per item but dramatically improves the order of the top results users actually see.
Cache Embeddings
Embedding the same content repeatedly wastes money. Embed once on ingest, store the vectors, and only re-embed when the content changes.
