๐Key Takeaways
- 1Inverted index: maps each word โ list of documents containing it. Foundation of text search.
- 2Elasticsearch: distributed inverted index with full-text search, aggregations, and near-real-time indexing
- 3Vector search: embed queries and documents as vectors, find nearest neighbors โ enables semantic search
- 4Modern hybrid search: combine keyword (BM25) + semantic (vector) scores for best results
Building Search That Understands Intent
Search is a core component of nearly every application โ e-commerce product search, document search, log search, code search. The fundamental data structure is the inverted index, but modern systems layer ML ranking and semantic understanding on top.
Search Technologies
| Technology | Type | Latency | Best For |
|---|---|---|---|
| Elasticsearch/OpenSearch | Inverted index + BM25 | 10-50ms | Full-text search, log search, e-commerce |
| Pinecone / Weaviate / Milvus | Vector database | 5-20ms | Semantic search, AI/RAG applications |
| Typesense / Meilisearch | Typo-tolerant search | 1-5ms | Autocomplete, site search, small-medium datasets |
| Algolia | Managed search API | 1-5ms | E-commerce, site search, developer-friendly |
Search Architecture
Document: 'The quick brown fox'. Index: 'quick'โ[doc1], 'brown'โ[doc1], 'fox'โ[doc1,doc3].
At query time, look up each query term in the index, intersect document lists. Score by TF-IDF or BM25.
Elasticsearch shards the index across nodes for scalability.
Embed both queries and documents as dense vectors using transformer models.
Find nearest vectors using Approximate Nearest Neighbor (ANN) algorithms: HNSW, IVF.
Captures meaning: 'affordable laptop' matches 'budget notebook computer' โ impossible with keyword search alone.
Combine keyword (BM25 score) and semantic (vector similarity score) with weighted fusion.
Keyword: high precision for exact matches. Semantic: high recall for conceptual matches.
Best of both worlds. Modern Elasticsearch 8.x supports both natively.
Advantages
- โขInverted indexes provide fast exact-match search
- โขVector search enables semantic understanding
- โขHybrid search combines precision and recall
Disadvantages
- โขElasticsearch clusters are resource-intensive
- โขVector search requires embedding model infrastructure
- โขSearch relevance tuning is an ongoing effort
๐งช Test Your Understanding
What advantage does vector search have over keyword search?