Advanced25 min readยท Topic 11.3

Search systems

Inverted index, Elasticsearch, relevance scoring, vector search, hybrid search

๐Ÿ”Key Takeaways

  • 1
    Inverted index: maps each word โ†’ list of documents containing it. Foundation of text search.
  • 2
    Elasticsearch: distributed inverted index with full-text search, aggregations, and near-real-time indexing
  • 3
    Vector search: embed queries and documents as vectors, find nearest neighbors โ€” enables semantic search
  • 4
    Modern hybrid search: combine keyword (BM25) + semantic (vector) scores for best results

Building Search That Understands Intent

Search is a core component of nearly every application โ€” e-commerce product search, document search, log search, code search. The fundamental data structure is the inverted index, but modern systems layer ML ranking and semantic understanding on top.

Search Technologies

TechnologyTypeLatencyBest For
Elasticsearch/OpenSearchInverted index + BM2510-50msFull-text search, log search, e-commerce
Pinecone / Weaviate / MilvusVector database5-20msSemantic search, AI/RAG applications
Typesense / MeilisearchTypo-tolerant search1-5msAutocomplete, site search, small-medium datasets
AlgoliaManaged search API1-5msE-commerce, site search, developer-friendly

Search Architecture

Document: 'The quick brown fox'. Index: 'quick'โ†’[doc1], 'brown'โ†’[doc1], 'fox'โ†’[doc1,doc3].

At query time, look up each query term in the index, intersect document lists. Score by TF-IDF or BM25.

Elasticsearch shards the index across nodes for scalability.

Embed both queries and documents as dense vectors using transformer models.

Find nearest vectors using Approximate Nearest Neighbor (ANN) algorithms: HNSW, IVF.

Captures meaning: 'affordable laptop' matches 'budget notebook computer' โ€” impossible with keyword search alone.

Combine keyword (BM25 score) and semantic (vector similarity score) with weighted fusion.

Keyword: high precision for exact matches. Semantic: high recall for conceptual matches.

Best of both worlds. Modern Elasticsearch 8.x supports both natively.

Advantages

  • โ€ขInverted indexes provide fast exact-match search
  • โ€ขVector search enables semantic understanding
  • โ€ขHybrid search combines precision and recall

Disadvantages

  • โ€ขElasticsearch clusters are resource-intensive
  • โ€ขVector search requires embedding model infrastructure
  • โ€ขSearch relevance tuning is an ongoing effort

๐Ÿงช Test Your Understanding

Knowledge Check1/1

What advantage does vector search have over keyword search?