Skip to main content
Hybrid search stage combining vector and full-text search with RRF fusion
The Hybrid Search stage combines semantic (vector) search with full-text (BM25) search, merging results using Reciprocal Rank Fusion (RRF). This provides the best of both worlds: semantic understanding of meaning plus exact keyword matching.
Stage Category: FILTER (Reduces document set)Transformation: Collection → top_k documents (ranked by fused score)

When to Use

Use CaseDescription
Product searchExact model numbers + semantic descriptions
Technical documentationFunction names + conceptual explanations
Mixed queriesUsers mix exact terms with natural language
E-commerce”iPhone 15 Pro Max 256GB black”
Better recallWhen pure semantic search misses exact matches

When NOT to Use

ScenarioRecommended Alternative
Pure semantic queriessemantic_search (faster)
Exact field matching onlystructured_filter
Low latency requirementssemantic_search (single index)

Parameters

ParameterTypeDefaultDescription
querystringRequiredSearch query (supports templates)
vector_indexstringRequiredVector index for semantic search
text_fieldstringcontentField for full-text search
top_kinteger100Number of candidates to retrieve
vector_weightfloat0.7Weight for semantic results (0.0-1.0)
text_weightfloat0.3Weight for text results (0.0-1.0)
rrf_kinteger60RRF constant (higher = less rank sensitivity)
min_scorefloat0.0Minimum fused score threshold
filtersobjectnullPre-filter conditions

Configuration Examples

{
  "stage_type": "filter",
  "stage_id": "hybrid_search",
  "parameters": {
    "query": "{{INPUT.query}}",
    "vector_index": "text_extractor_v1_embedding",
    "top_k": 100,
    "vector_weight": 0.7,
    "text_weight": 0.3
  }
}

How RRF Works

Reciprocal Rank Fusion combines ranked lists from multiple search systems:
RRF_score = Σ (1 / (k + rank_i)) × weight_i
ParameterEffect
Higher rrf_kMore equal treatment of all ranks
Lower rrf_kTop ranks dominate more
Higher vector_weightSemantic results prioritized
Higher text_weightExact matches prioritized
The default rrf_k=60 works well for most cases. Decrease to 10-20 if you want top results to matter more; increase to 100+ for more equal rank treatment.

Weight Selection Guide

Query TypeVector WeightText WeightExample
Natural language0.850.15”comfortable shoes for running”
Mixed0.70.3”Nike running shoes comfortable”
Product search0.50.5”iPhone 15 Pro Max 256GB”
Technical docs0.60.4”async/await error handling”
Code search0.40.6”function calculateTotal”

Output

Each returned document includes:
FieldTypeDescription
document_idstringUnique document identifier
scorefloatFused RRF score
vector_scorefloatSemantic similarity score
text_scorefloatBM25 text match score
contentstringDocument content
metadataobjectDocument metadata

Performance

MetricValue
Latency20-100ms (depends on index sizes)
Parallel executionVector and text search run concurrently
Fusion overhead< 5ms

Common Pipeline Patterns

Hybrid Search + Rerank

[
  {
    "stage_type": "filter",
    "stage_id": "hybrid_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 100,
      "vector_weight": 0.7,
      "text_weight": 0.3
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "rerank",
    "parameters": {
      "model": "bge-reranker-v2-m3",
      "top_n": 10
    }
  }
]

Hybrid Search + Post-Filter

[
  {
    "stage_type": "filter",
    "stage_id": "hybrid_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 200
    }
  },
  {
    "stage_type": "filter",
    "stage_id": "structured_filter",
    "parameters": {
      "conditions": {
        "field": "metadata.price",
        "operator": "lte",
        "value": "{{INPUT.max_price}}"
      }
    }
  },
  {
    "stage_type": "reduce",
    "stage_id": "limit",
    "parameters": {
      "limit": 10
    }
  }
]

Comparison: Semantic vs Hybrid

FeatureSemantic SearchHybrid Search
Exact matchesMay missCaptured
Conceptual matchesExcellentExcellent
Latency5-50ms20-100ms
Best forNatural languageMixed queries
Product/SKU searchPoorGood