Skip to main content
Business Impact: Enable users to find what they mean, not just what they type. Reduce zero-result queries by 60-80%, increase conversion rates, and surface relevant content even with typos, synonyms, or questions.
Semantic search goes beyond keyword matching to understand query intent and find conceptually relevant results. This pattern combines dense vector embeddings, sparse representations (BM25), and optional reranking for state-of-the-art retrieval accuracy.

Why Semantic Search?

Traditional keyword search fails when:
  • Queries use different terminology than documents (“car” vs “automobile”)
  • Users ask questions instead of keywords (“what’s the best laptop for coding?”)
  • Context matters (“jaguar” the animal vs the car brand)
  • Multilingual content requires cross-language matching
Semantic search solves these by mapping text to high-dimensional vectors where semantically similar content clusters together, regardless of exact word matches.

vs Building It Yourself

TaskWithout MixpeekWith Mixpeek
Deploy embedding models (CPU/GPU)4-6 weeksInstant
Setup vector database (Qdrant/Pinecone)2-3 weeksInstant
Build hybrid search (dense + sparse)3-4 weeks30 minutes
Implement reranking pipeline2-3 weeksConfig change
A/B test different models2-4 weeks15 minutes
Production monitoring & caching2-3 weeksBuilt-in
Engineering time saved: 3-4 monthsInfrastructure complexity: Zero
Key Differentiator: Hot-swap embedding models without rebuilding indexes. Test text-embedding-3-small vs multilingual-e5-large on the same data, compare retrieval quality, and switch in minutes—not weeks.

Object Decomposition

Implementation Steps

1. Create a Bucket for Content

POST /v1/buckets
{
  "bucket_name": "knowledge-base",
  "description": "Product documentation and FAQs",
  "schema": {
    "properties": {
      "title": { "type": "text", "required": true },
      "content": { "type": "text", "required": true },
      "category": { "type": "text" },
      "tags": { "type": "array" },
      "published_at": { "type": "datetime" }
    }
  }
}

2. Define a Collection with Text Embeddings

POST /v1/collections
{
  "collection_name": "docs-search",
  "description": "Semantic search over documentation",
  "source": { "type": "bucket", "bucket_id": "bkt_kb" },
  "feature_extractor": {
    "feature_extractor_name": "text_extractor",
    "version": "v1",
    "input_mappings": {
      "text": "content"
    },
    "parameters": {
      "model": "multilingual-e5-large-instruct",
      "chunk_strategy": "sentence",
      "chunk_size": 512,
      "chunk_overlap": 50
    },
    "field_passthrough": [
      { "source_path": "title" },
      { "source_path": "category" },
      { "source_path": "tags" },
      { "source_path": "published_at" }
    ]
  }
}
Chunking Strategies:
  • sentence – Split on sentence boundaries (best for Q&A)
  • paragraph – Preserve larger context (best for long-form content)
  • fixed – Fixed token windows (predictable chunk sizes)
  • semantic – Use model to detect topic shifts (experimental)

3. Ingest Documents

POST /v1/buckets/{bucket_id}/objects
{
  "key_prefix": "/docs/api",
  "metadata": {
    "title": "Authentication Guide",
    "content": "Mixpeek uses Bearer token authentication...",
    "category": "getting-started",
    "tags": ["auth", "security", "api-keys"],
    "published_at": "2025-10-01T12:00:00Z"
  }
}
For bulk ingestion, use batch operations:
POST /v1/buckets/{bucket_id}/objects/batch
{
  "objects": [
    { "metadata": {...}, "key_prefix": "/docs/api" },
    { "metadata": {...}, "key_prefix": "/docs/sdk" }
  ]
}

4. Create a Basic Semantic Retriever

POST /v1/retrievers
{
  "retriever_name": "docs-semantic-search",
  "collection_ids": ["col_docs"],
  "input_schema": {
    "properties": {
      "query": { "type": "text", "required": true }
    }
  },
  "stages": [
    {
      "stage_name": "knn_search",
      "version": "v1",
      "parameters": {
        "feature_address": "mixpeek://text_extractor@v1/text_embedding",
        "input_mapping": { "text": "query" },
        "limit": 50
      }
    },
    {
      "stage_name": "sort",
      "version": "v1",
      "parameters": {
        "sort_by": [{ "field": "score", "direction": "desc" }]
      }
    }
  ],
  "cache_config": {
    "enabled": true,
    "ttl_seconds": 300
  }
}
POST /v1/retrievers/{retriever_id}/execute
{
  "inputs": { "query": "how do I authenticate API requests?" },
  "limit": 10,
  "return_urls": false
}
Response:
{
  "results": [
    {
      "document_id": "doc_auth_guide",
      "score": 0.89,
      "metadata": {
        "title": "Authentication Guide",
        "category": "getting-started",
        "tags": ["auth", "security"]
      }
    }
  ],
  "execution_id": "exec_123",
  "cache_hit": false,
  "stage_statistics": {
    "knn_search": { "duration_ms": 45, "results_count": 50 }
  }
}

Model Evolution & A/B Testing

Mixpeek lets you test new models without disrupting production. Create parallel collections, compare results, and migrate seamlessly.

Test New Embedding Models

# Production: Current model
POST /v1/collections
{
  "collection_name": "docs-search-v1",
  "feature_extractor": {
    "parameters": { "model": "multilingual-e5-base" }
  }
}

# Staging: Test larger model
POST /v1/collections
{
  "collection_name": "docs-search-v2",
  "feature_extractor": {
    "parameters": { "model": "multilingual-e5-large-instruct" }
  }
}

Compare Retrieval Quality

# Query both collections
POST /v1/retrievers/ret_v1/execute
{ "inputs": { "query": "authentication guide" } }

POST /v1/retrievers/ret_v2/execute
{ "inputs": { "query": "authentication guide" } }

# Compare metrics
GET /v1/analytics/retrievers/compare?baseline=ret_v1&candidate=ret_v2
Returns:
  • Precision@10: v1 (0.72) vs v2 (0.84) → +16% improvement
  • Latency P95: v1 (45ms) vs v2 (68ms) → acceptable tradeoff
  • User CTR: v1 (34%) vs v2 (41%) → +7% engagement

Migrate When Ready

# Switch retriever to new collection
PATCH /v1/retrievers/{retriever_id}
{ "collection_ids": ["col_docs_v2"] }

# Archive old collection
DELETE /v1/collections/col_docs_v1
Zero downtime. No index rebuild. Production stays live.

Advanced Patterns

Hybrid Search (Dense + Sparse)

Combine vector embeddings with BM25 keyword matching for best-of-both-worlds:
{
  "stages": [
    {
      "stage_name": "hybrid_search",
      "version": "v1",
      "parameters": {
        "queries": [
          {
            "feature_address": "mixpeek://text_extractor@v1/text_embedding",
            "input_mapping": { "text": "query" },
            "weight": 0.7
          },
          {
            "feature_address": "mixpeek://text_extractor@v1/bm25_sparse",
            "input_mapping": { "text": "query" },
            "weight": 0.3
          }
        ],
        "fusion_method": "rrf",
        "limit": 50
      }
    }
  ]
}
When to use:
  • Queries mixing natural language + exact terminology (“React hooks API reference”)
  • Domain-specific jargon where semantics struggle (product SKUs, error codes)
  • Multilingual content where BM25 handles exact matches and vectors handle translations

Filter Before Search (for Efficiency)

Apply structured filters before expensive vector operations:
{
  "stages": [
    {
      "stage_name": "filter",
      "version": "v1",
      "parameters": {
        "filters": {
          "operator": "and",
          "conditions": [
            {
              "field": "metadata.category",
              "operator": "eq",
              "value": "getting-started"
            },
            {
              "field": "metadata.published_at",
              "operator": "gte",
              "value": "2025-01-01T00:00:00Z"
            }
          ]
        }
      }
    },
    {
      "stage_name": "knn_search",
      "version": "v1",
      "parameters": {
        "feature_address": "mixpeek://text_extractor@v1/text_embedding",
        "input_mapping": { "text": "query" },
        "limit": 50
      }
    }
  ]
}

Rerank with Cross-Encoder

Use a cross-encoder model for final reranking (higher accuracy, slower):
{
  "stages": [
    {
      "stage_name": "knn_search",
      "version": "v1",
      "parameters": {
        "limit": 100  # Retrieve more candidates
      }
    },
    {
      "stage_name": "rerank",
      "version": "v1",
      "parameters": {
        "model": "cross-encoder/ms-marco-MiniLM-L-12-v2",
        "input_mapping": {
          "query": "query",
          "document": "metadata.content"
        },
        "top_k": 20  # Return top 20 after reranking
      }
    }
  ]
}

Query Expansion

Generate multiple query variations to improve recall:
{
  "stages": [
    {
      "stage_name": "llm_generation",
      "version": "v1",
      "parameters": {
        "model": "gpt-4o-mini",
        "prompt": "Generate 3 alternative phrasings of this query: {{inputs.query}}",
        "output_format": "json_array"
      }
    },
    {
      "stage_name": "knn_search",
      "version": "v1",
      "parameters": {
        "feature_address": "mixpeek://text_extractor@v1/text_embedding",
        "input_mapping": { "text": "STAGE.llm_generation.expanded_queries" },
        "limit": 50
      }
    }
  ]
}

Feedback Loop with Interactions

Record user interactions to improve relevance over time:
# User clicks on result
POST /v1/retrievers/{retriever_id}/interactions
{
  "execution_id": "exec_123",
  "document_id": "doc_auth_guide",
  "interaction_type": "click",
  "metadata": {
    "position": 1,
    "query": "how do I authenticate API requests?"
  }
}

# User provides explicit feedback
POST /v1/retrievers/{retriever_id}/interactions
{
  "execution_id": "exec_123",
  "document_id": "doc_auth_guide",
  "interaction_type": "positive_feedback"
}
Use signals in analytics:
GET /v1/analytics/retrievers/{retriever_id}/signals
Identify low-CTR queries or high-negative-feedback documents to refine taxonomy mappings or retrain models.

Chunking Best Practices

Content TypeRecommended StrategyChunk SizeOverlap
Technical docssentence256-512 tokens50 tokens
Long-form articlesparagraph512-1024 tokens100 tokens
FAQssemantic (detect Q&A boundaries)Variable0
Code snippetsfixed (preserve syntax)256 tokens20 tokens
Product descriptionssentence128-256 tokens25 tokens
General rules:
  • Smaller chunks = better precision, worse recall
  • Larger chunks = more context, but noisier matches
  • Overlap prevents splitting relevant context across boundaries

Model Selection

ModelLatencyAccuracyUse Case
multilingual-e5-baseFastGoodHigh-volume, cost-sensitive
multilingual-e5-large-instructMediumExcellentGeneral-purpose semantic search
bge-large-en-v1.5MediumExcellentEnglish-only, high accuracy
openai/text-embedding-3-largeSlowBestPremium use cases, multilingual
cohere/embed-english-v3MediumExcellentDomain-specific fine-tuning
Test with your data:
POST /v1/retrievers/debug-inference
{
  "model": "multilingual-e5-large-instruct",
  "text": "sample query",
  "return_embedding": true
}

Performance Optimization

1. Cache Aggressively

{
  "cache_config": {
    "enabled": true,
    "ttl_seconds": 600,  # 10 minutes for stable queries
    "cache_stage_names": ["knn_search", "rerank"]  # Only cache expensive stages
  }
}

2. Tune Limit Values

# Retrieve more candidates for reranking
{
  "stages": [
    { "stage_name": "knn_search", "parameters": { "limit": 200 } },
    { "stage_name": "rerank", "parameters": { "top_k": 20 } }
  ]
}

3. Use Pre-Filters

Filter by category, date, or other metadata before vector search to reduce search space.

4. Monitor Analytics

GET /v1/analytics/retrievers/{retriever_id}/performance
GET /v1/analytics/retrievers/{retriever_id}/stages
Identify slow stages and optimize (e.g., disable reranking for low-value queries).

Use Case Examples

Ingest help articles, FAQs, and troubleshooting guides. Use hybrid search to handle both natural language questions (“Why isn’t my API key working?”) and exact error codes (“401 Unauthorized”).
Chunk academic papers by section, embed abstracts and full text. Enable researchers to find relevant papers by concept (“neural architecture search for vision transformers”) rather than exact citation matching.

Evaluation & Tuning

Offline Evaluation

Create a golden dataset with query-document pairs:
POST /v1/retrievers/{retriever_id}/evaluations
{
  "test_queries": [
    {
      "query": "how to authenticate",
      "relevant_doc_ids": ["doc_auth_guide", "doc_api_keys"]
    }
  ],
  "metrics": ["precision@10", "recall@10", "mrr", "ndcg"]
}

A/B Testing

Create retriever variants and compare:
# Variant A: Vector-only
POST /v1/retrievers { ... "retriever_name": "search-vector" }

# Variant B: Hybrid
POST /v1/retrievers { ... "retriever_name": "search-hybrid" }
Split traffic and monitor:
  • Click-through rate (CTR)
  • Time to first click
  • Zero-result queries
  • Negative feedback rate

Next Steps