Business Impact:  Enable users to find what they mean, not just what they type. Reduce zero-result queries by 60-80%, increase conversion rates, and surface relevant content even with typos, synonyms, or questions.
Semantic search goes beyond keyword matching to understand query intent and find conceptually relevant results. This pattern combines dense vector embeddings, sparse representations (BM25), and optional reranking for state-of-the-art retrieval accuracy. 
Why Semantic Search? Traditional keyword search fails when: 
Queries use different terminology than documents (“car” vs “automobile”) 
Users ask questions instead of keywords (“what’s the best laptop for coding?”) 
Context matters (“jaguar” the animal vs the car brand) 
Multilingual content requires cross-language matching 
 
Semantic search solves these by mapping text to high-dimensional vectors where semantically similar content clusters together, regardless of exact word matches. 
vs Building It Yourself Task Without Mixpeek With Mixpeek Deploy embedding models (CPU/GPU) 4-6 weeks Instant Setup vector database (Qdrant/Pinecone) 2-3 weeks Instant Build hybrid search (dense + sparse) 3-4 weeks 30 minutes Implement reranking pipeline 2-3 weeks Config change A/B test different models 2-4 weeks 15 minutes Production monitoring & caching 2-3 weeks Built-in 
Engineering time saved: 3-4 months  • Infrastructure complexity: Zero Key Differentiator:  Hot-swap embedding models without rebuilding indexes. Test text-embedding-3-small vs multilingual-e5-large on the same data, compare retrieval quality, and switch in minutes—not weeks.
Object Decomposition Implementation Steps 1. Create a Bucket for Content POST  /v1/buckets {   "bucket_name" :  "knowledge-base",   "description" :  "Product documentation and FAQs",   "schema" :  {     "properties" :  {       "title" :  {  "type":  "text",  "required":  true  },       "content" :  {  "type":  "text",  "required":  true  },       "category" :  {  "type":  "text"  },       "tags" :  {  "type":  "array"  },       "published_at" :  {  "type":  "datetime"  }     }   } } 2. Define a Collection with Text Embeddings POST  /v1/collections {   "collection_name" :  "docs-search",   "description" :  "Semantic search over documentation",   "source" :  {  "type":  "bucket",  "bucket_id":  "bkt_kb"  },   "feature_extractor" :  {     "feature_extractor_name" :  "text_extractor",     "version" :  "v1",     "input_mappings" :  {       "text" :  "content"     },     "parameters" :  {       "model" :  "multilingual-e5-large-instruct",       "chunk_strategy" :  "sentence",       "chunk_size" :  512,       "chunk_overlap" :  50     },     "field_passthrough" :  [       {  "source_path" :  "title"  },       {  "source_path" :  "category"  },       {  "source_path" :  "tags"  },       {  "source_path" :  "published_at"  }     ]   } } Chunking Strategies: 
sentence – Split on sentence boundaries (best for Q&A)paragraph – Preserve larger context (best for long-form content)fixed – Fixed token windows (predictable chunk sizes)semantic – Use model to detect topic shifts (experimental) 
3. Ingest Documents POST  /v1/buckets/{bucket_id}/objects {   "key_prefix" :  "/docs/api",   "metadata" :  {     "title" :  "Authentication Guide",     "content" :  "Mixpeek uses Bearer token authentication...",     "category" :  "getting-started",     "tags" :  [ "auth" ,  "security",  "api-keys"],     "published_at" :  "2025-10-01T12:00:00Z"   } } For bulk ingestion, use batch operations: 
POST  /v1/buckets/{bucket_id}/objects/batch {   "objects" :  [     {  "metadata" :  {...},  "key_prefix":  "/docs/api"  },     {  "metadata" :  {...},  "key_prefix":  "/docs/sdk"  }   ] } 4. Create a Basic Semantic Retriever POST  /v1/retrievers {   "retriever_name" :  "docs-semantic-search",   "collection_ids" :  [ "col_docs" ],   "input_schema" :  {     "properties" :  {       "query" :  {  "type":  "text",  "required":  true  }     }   },   "stages" :  [     {       "stage_name" :  "knn_search",       "version" :  "v1",       "parameters" :  {         "feature_address" :  "mixpeek://text_extractor@v1/text_embedding",         "input_mapping" :  {  "text":  "query"  },         "limit" :  50       }     },     {       "stage_name" :  "sort",       "version" :  "v1",       "parameters" :  {         "sort_by" :  [{  "field" :  "score",  "direction":  "desc"  }]       }     }   ],   "cache_config" :  {     "enabled" :  true ,     "ttl_seconds" :  300   } } 5. Execute Search POST  /v1/retrievers/{retriever_id}/execute {   "inputs" :  {  "query":  "how do I authenticate API requests?"  },   "limit" :  10,   "return_urls" :  false } Response: {   "results" : [     {       "document_id" :  "doc_auth_guide" ,       "score" :  0.89 ,       "metadata" : {         "title" :  "Authentication Guide" ,         "category" :  "getting-started" ,         "tags" : [ "auth" ,  "security" ]       }     }   ],   "execution_id" :  "exec_123" ,   "cache_hit" :  false ,   "stage_statistics" : {     "knn_search" : {  "duration_ms" :  45 ,  "results_count" :  50  }   } } Model Evolution & A/B Testing Mixpeek lets you test new models without disrupting production. Create parallel collections, compare results, and migrate seamlessly. 
Test New Embedding Models # Production: Current model POST  /v1/collections {   "collection_name" :  "docs-search-v1",   "feature_extractor" :  {     "parameters" :  {  "model":  "multilingual-e5-base"  }   } } # Staging: Test larger model POST  /v1/collections {   "collection_name" :  "docs-search-v2",   "feature_extractor" :  {     "parameters" :  {  "model":  "multilingual-e5-large-instruct"  }   } } Compare Retrieval Quality # Query both collections POST  /v1/retrievers/ret_v1/execute {  "inputs" :  {  "query":  "authentication guide"  }  } POST  /v1/retrievers/ret_v2/execute {  "inputs" :  {  "query":  "authentication guide"  }  } # Compare metrics GET  /v1/analytics/retrievers/compare?baseline=ret_v1 & candidate = ret_v2 Returns: 
Precision@10: v1 (0.72) vs v2 (0.84) → +16% improvement 
Latency P95: v1 (45ms) vs v2 (68ms) → acceptable tradeoff 
User CTR: v1 (34%) vs v2 (41%) → +7% engagement 
 
Migrate When Ready # Switch retriever to new collection PATCH  /v1/retrievers/{retriever_id} {  "collection_ids" :  [ "col_docs_v2" ] } # Archive old collection DELETE  /v1/collections/col_docs_v1 Zero downtime. No index rebuild. Production stays live. Advanced Patterns Hybrid Search (Dense + Sparse) Combine vector embeddings with BM25 keyword matching for best-of-both-worlds: 
{   "stages" :  [     {       "stage_name" :  "hybrid_search",       "version" :  "v1",       "parameters" :  {         "queries" :  [           {             "feature_address" :  "mixpeek://text_extractor@v1/text_embedding",             "input_mapping" :  {  "text":  "query"  },             "weight" :  0.7           },           {             "feature_address" :  "mixpeek://text_extractor@v1/bm25_sparse",             "input_mapping" :  {  "text":  "query"  },             "weight" :  0.3           }         ],         "fusion_method" :  "rrf",         "limit" :  50       }     }   ] } When to use: 
Queries mixing natural language + exact terminology (“React hooks API reference”) 
Domain-specific jargon where semantics struggle (product SKUs, error codes) 
Multilingual content where BM25 handles exact matches and vectors handle translations 
 
Filter Before Search (for Efficiency) Apply structured filters before expensive vector operations: 
{   "stages" :  [     {       "stage_name" :  "filter",       "version" :  "v1",       "parameters" :  {         "filters" :  {           "operator" :  "and",           "conditions" :  [             {               "field" :  "metadata.category",               "operator" :  "eq",               "value" :  "getting-started"             },             {               "field" :  "metadata.published_at",               "operator" :  "gte",               "value" :  "2025-01-01T00:00:00Z"             }           ]         }       }     },     {       "stage_name" :  "knn_search",       "version" :  "v1",       "parameters" :  {         "feature_address" :  "mixpeek://text_extractor@v1/text_embedding",         "input_mapping" :  {  "text":  "query"  },         "limit" :  50       }     }   ] } Rerank with Cross-Encoder Use a cross-encoder model for final reranking (higher accuracy, slower): 
{   "stages" :  [     {       "stage_name" :  "knn_search",       "version" :  "v1",       "parameters" :  {         "limit" :  100   # Retrieve more candidates       }     },     {       "stage_name" :  "rerank",       "version" :  "v1",       "parameters" :  {         "model" :  "cross-encoder/ms-marco-MiniLM-L-12-v2",         "input_mapping" :  {           "query" :  "query",           "document" :  "metadata.content"         },         "top_k" :  20   # Return top 20 after reranking       }     }   ] } Query Expansion Generate multiple query variations to improve recall: 
{   "stages" :  [     {       "stage_name" :  "llm_generation",       "version" :  "v1",       "parameters" :  {         "model" :  "gpt-4o-mini",         "prompt" :  "Generate 3 alternative phrasings of this query: {{inputs.query}}",         "output_format" :  "json_array"       }     },     {       "stage_name" :  "knn_search",       "version" :  "v1",       "parameters" :  {         "feature_address" :  "mixpeek://text_extractor@v1/text_embedding",         "input_mapping" :  {  "text":  "STAGE.llm_generation.expanded_queries"  },         "limit" :  50       }     }   ] } Feedback Loop with Interactions Record user interactions to improve relevance over time: 
# User clicks on result POST  /v1/retrievers/{retriever_id}/interactions {   "execution_id" :  "exec_123",   "document_id" :  "doc_auth_guide",   "interaction_type" :  "click",   "metadata" :  {     "position" :  1,     "query" :  "how do I authenticate API requests?"   } } # User provides explicit feedback POST  /v1/retrievers/{retriever_id}/interactions {   "execution_id" :  "exec_123",   "document_id" :  "doc_auth_guide",   "interaction_type" :  "positive_feedback" } Use signals in analytics: 
GET  /v1/analytics/retrievers/{retriever_id}/signals Identify low-CTR queries or high-negative-feedback documents to refine taxonomy mappings or retrain models. 
Chunking Best Practices Content Type Recommended Strategy Chunk Size Overlap Technical docs sentence256-512 tokens 50 tokens Long-form articles paragraph512-1024 tokens 100 tokens FAQs semantic (detect Q&A boundaries)Variable 0 Code snippets fixed (preserve syntax)256 tokens 20 tokens Product descriptions sentence128-256 tokens 25 tokens 
General rules: 
Smaller chunks = better precision, worse recall 
Larger chunks = more context, but noisier matches 
Overlap prevents splitting relevant context across boundaries 
 
Model Selection Model Latency Accuracy Use Case multilingual-e5-base Fast Good High-volume, cost-sensitive multilingual-e5-large-instruct Medium Excellent General-purpose semantic search bge-large-en-v1.5 Medium Excellent English-only, high accuracy openai/text-embedding-3-large Slow Best Premium use cases, multilingual cohere/embed-english-v3 Medium Excellent Domain-specific fine-tuning 
Test with your data: POST  /v1/retrievers/debug-inference {   "model" :  "multilingual-e5-large-instruct",   "text" :  "sample query",   "return_embedding" :  true } 1. Cache Aggressively {   "cache_config" :  {     "enabled" :  true ,     "ttl_seconds" :  600,   # 10 minutes for stable queries     "cache_stage_names" :  [ "knn_search" ,  "rerank"]   # Only cache expensive stages   } } 2. Tune Limit Values # Retrieve more candidates for reranking {   "stages" :  [     {  "stage_name" :  "knn_search",  "parameters":  {  "limit":  200  }  },     {  "stage_name" :  "rerank",  "parameters":  {  "top_k":  20  }  }   ] } 3. Use Pre-Filters Filter by category, date, or other metadata before vector search to reduce search space. 
4. Monitor Analytics GET  /v1/analytics/retrievers/{retriever_id}/performance GET  /v1/analytics/retrievers/{retriever_id}/stages Identify slow stages and optimize (e.g., disable reranking for low-value queries). 
Use Case Examples 
Customer Support Knowledge Base
Ingest help articles, FAQs, and troubleshooting guides. Use hybrid search to handle both natural language questions (“Why isn’t my API key working?”) and exact error codes (“401 Unauthorized”). 
E-Commerce Product Search
Embed product descriptions and use semantic search to match queries like “comfortable running shoes for long distances” to products without exact keyword matches. Add filters for price, brand, and availability. 
Chunk academic papers by section, embed abstracts and full text. Enable researchers to find relevant papers by concept (“neural architecture search for vision transformers”) rather than exact citation matching. 
Index company wikis, Slack archives, and meeting notes. Use namespaces to isolate teams and apply fine-grained access control. Track interactions to surface most-helpful content. 
Chunk legal contracts and case law. Use reranking to prioritize exact precedent matches. Apply filters for jurisdiction, date ranges, and case types. 
Evaluation & Tuning Offline Evaluation Create a golden dataset with query-document pairs: 
POST  /v1/retrievers/{retriever_id}/evaluations {   "test_queries" :  [     {       "query" :  "how to authenticate",       "relevant_doc_ids" :  [ "doc_auth_guide" ,  "doc_api_keys"]     }   ],   "metrics" :  [ "precision@10" ,  "recall@10",  "mrr",  "ndcg"] } A/B Testing Create retriever variants and compare: 
# Variant A: Vector-only POST  /v1/retrievers  {  ...  "retriever_name":  "search-vector"  } # Variant B: Hybrid POST  /v1/retrievers  {  ...  "retriever_name":  "search-hybrid"  } Split traffic and monitor: 
Click-through rate (CTR) 
Time to first click 
Zero-result queries 
Negative feedback rate 
 
Next Steps