Skip to main content
Query Expand stage showing LLM-powered query variations and result fusion
The Query Expand stage uses language models to generate multiple query variations from the original query, executes searches for each variation, and fuses the results. This improves recall by capturing different phrasings and aspects of the user’s intent.
Stage Category: FILTER (Generates and fuses search results)Transformation: 1 query → N query variations → fused results

When to Use

Use CaseDescription
Improved recallCapture documents that match alternative phrasings
Ambiguous queriesHandle queries with multiple interpretations
Synonym expansionFind documents using different terminology
Multi-aspect searchBreak complex queries into sub-queries

When NOT to Use

ScenarioRecommended Alternative
Simple keyword searchsemantic_search directly
Low latency requirementsPre-compute expansions
Precise single-intent queriesStandard search
Cost-sensitive applicationsUse simpler search

Parameters

ParameterTypeDefaultDescription
modelstringRequiredLLM model for query generation
querystring{{INPUT.query}}Original query to expand
num_variationsinteger3Number of query variations to generate
vector_indexstringRequiredVector index for searches
top_kinteger20Results per query variation
fusion_methodstringrrfResult fusion method: rrf, linear, max
expansion_promptstringautoCustom prompt for query generation

Fusion Methods

MethodDescriptionBest For
rrfReciprocal Rank FusionGeneral purpose, balanced
linearWeighted score combinationWhen scores are comparable
maxTake maximum scoreWhen any match is good

Configuration Examples

{
  "stage_type": "filter",
  "stage_id": "query_expand",
  "parameters": {
    "model": "gpt-4o-mini",
    "query": "{{INPUT.query}}",
    "vector_index": "text_extractor_v1_embedding",
    "num_variations": 3,
    "top_k": 30
  }
}

How Query Expansion Works

  1. Original Query: “how to fix memory leaks”
  2. LLM Generates Variations:
    • “memory leak detection and resolution”
    • “debugging memory issues in applications”
    • “preventing memory leaks in code”
  3. Execute Searches: Run vector search for each variation
  4. Fuse Results: Combine using RRF or other fusion method
  5. Return: Deduplicated, ranked result set

Reciprocal Rank Fusion (RRF)

RRF combines results from multiple queries using the formula:
score(doc) = Σ 1 / (k + rank_i)
Where k is typically 60, and rank_i is the document’s rank in query i’s results.
AdvantageDescription
Score-agnosticWorks with different scoring scales
Rank-basedFocuses on relative ordering
Self-balancingNo manual weight tuning

Output Schema

Each document includes fusion metadata:
{
  "document_id": "doc_123",
  "content": "Document content...",
  "score": 0.87,
  "query_expand": {
    "matched_variations": ["memory leak detection", "debugging memory issues"],
    "fusion_score": 0.87,
    "individual_ranks": [2, 5, null]
  }
}

Performance

MetricValue
Latency300-800ms (LLM + searches)
LLM calls1 per execution
Search callsN (num_variations)
Token usage~50-100 tokens
Query expansion adds latency due to LLM generation and multiple searches. Use judiciously for queries where recall improvement justifies the cost.

Common Pipeline Patterns

Expanded Search + Rerank

[
  {
    "stage_type": "filter",
    "stage_id": "query_expand",
    "parameters": {
      "model": "gpt-4o-mini",
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "num_variations": 3,
      "top_k": 50
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "rerank",
    "parameters": {
      "model": "bge-reranker-v2-m3",
      "top_n": 10
    }
  }
]

Expansion + Filter + Summarize

[
  {
    "stage_type": "filter",
    "stage_id": "query_expand",
    "parameters": {
      "model": "gpt-4o-mini",
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "num_variations": 4,
      "top_k": 30
    }
  },
  {
    "stage_type": "filter",
    "stage_id": "structured_filter",
    "parameters": {
      "conditions": {
        "field": "metadata.verified",
        "operator": "eq",
        "value": true
      }
    }
  },
  {
    "stage_type": "reduce",
    "stage_id": "summarize",
    "parameters": {
      "model": "gpt-4o",
      "prompt": "Answer based on the documents: {{INPUT.query}}"
    }
  }
]

Cost Optimization

StrategyImpact
Reduce num_variationsFewer searches
Use cheaper LLMgpt-4o-mini vs gpt-4o
Lower top_k per variationLess fusion overhead
Cache common expansionsReduce LLM calls

Error Handling

ErrorBehavior
LLM failureFall back to original query
Search failureSkip that variation
Empty expansionsUse original query only
TimeoutReturn partial results