Why Semantic Search?
Traditional keyword search fails when:- Queries use different terminology than documents (“car” vs “automobile”)
- Users ask questions instead of keywords (“what’s the best laptop for coding?”)
- Context matters (“jaguar” the animal vs the car brand)
- Multilingual content requires cross-language matching
Object Decomposition
Implementation Steps
1. Create a Bucket for Content
2. Define a Collection with Text Embeddings
sentence– Split on sentence boundaries (best for Q&A)paragraph– Preserve larger context (best for long-form content)fixed– Fixed token windows (predictable chunk sizes)semantic– Use model to detect topic shifts (experimental)
3. Ingest Documents
4. Create a Basic Semantic Retriever
5. Execute Search
Model Evolution & A/B Testing
You can test new models without disrupting production by creating parallel collections, comparing results, and migrating seamlessly.Test New Embedding Models
Compare Retrieval Quality
- Precision@10: v1 (0.72) vs v2 (0.84) → +16% improvement
- Latency P95: v1 (45ms) vs v2 (68ms) → acceptable tradeoff
- User CTR: v1 (34%) vs v2 (41%) → +7% engagement
Migrate When Ready
Advanced Patterns
Hybrid Search (Dense + Sparse)
Combine vector embeddings with BM25 keyword matching for best-of-both-worlds:- Queries mixing natural language + exact terminology (“React hooks API reference”)
- Domain-specific jargon where semantics struggle (product SKUs, error codes)
- Multilingual content where BM25 handles exact matches and vectors handle translations
Filter Before Search (for Efficiency)
Apply structured filters before expensive vector operations:Rerank with Cross-Encoder
Use a cross-encoder model for final reranking (higher accuracy, slower):Query Expansion
Generate multiple query variations to improve recall:Feedback Loop with Interactions
Record user interactions to improve relevance over time:Chunking Best Practices
| Content Type | Recommended Strategy | Chunk Size | Overlap |
|---|---|---|---|
| Technical docs | sentence | 256-512 tokens | 50 tokens |
| Long-form articles | paragraph | 512-1024 tokens | 100 tokens |
| FAQs | semantic (detect Q&A boundaries) | Variable | 0 |
| Code snippets | fixed (preserve syntax) | 256 tokens | 20 tokens |
| Product descriptions | sentence | 128-256 tokens | 25 tokens |
- Smaller chunks = better precision, worse recall
- Larger chunks = more context, but noisier matches
- Overlap prevents splitting relevant context across boundaries
Model Selection
| Model | Latency | Accuracy | Use Case |
|---|---|---|---|
| multilingual-e5-base | Fast | Good | High-volume, cost-sensitive |
| multilingual-e5-large-instruct | Medium | Excellent | General-purpose semantic search |
| bge-large-en-v1.5 | Medium | Excellent | English-only, high accuracy |
| openai/text-embedding-3-large | Slow | Best | Premium use cases, multilingual |
| cohere/embed-english-v3 | Medium | Excellent | Domain-specific fine-tuning |
Performance Optimization
1. Cache Aggressively
2. Tune Limit Values
3. Use Pre-Filters
Filter by category, date, or other metadata before vector search to reduce search space.4. Monitor Analytics
Use Case Examples
Customer Support Knowledge Base
Customer Support Knowledge Base
Ingest help articles, FAQs, and troubleshooting guides. Use hybrid search to handle both natural language questions (“Why isn’t my API key working?”) and exact error codes (“401 Unauthorized”).
E-Commerce Product Search
E-Commerce Product Search
Embed product descriptions and use semantic search to match queries like “comfortable running shoes for long distances” to products without exact keyword matches. Add filters for price, brand, and availability.
Research Paper Discovery
Research Paper Discovery
Chunk academic papers by section, embed abstracts and full text. Enable researchers to find relevant papers by concept (“neural architecture search for vision transformers”) rather than exact citation matching.
Internal Document Search
Internal Document Search
Index company wikis, Slack archives, and meeting notes. Use namespaces to isolate teams and apply fine-grained access control. Track interactions to surface most-helpful content.
Legal Document Retrieval
Legal Document Retrieval
Chunk legal contracts and case law. Use reranking to prioritize exact precedent matches. Apply filters for jurisdiction, date ranges, and case types.
Evaluation & Tuning
Offline Evaluation
Create a golden dataset with query-document pairs:A/B Testing
Create retriever variants and compare:- Click-through rate (CTR)
- Time to first click
- Zero-result queries
- Negative feedback rate
Next Steps
- Explore Retrievers for full stage catalog
- Learn Filters syntax for advanced queries
- Review Interactions for tracking user signals
- Check Feature Extractors for model options

