The Query Expand stage uses language models to generate multiple query variations from the original query, executes searches for each variation, and fuses the results. This improves recall by capturing different phrasings and aspects of the user’s intent.
Stage Category : FILTER (Generates and fuses search results)Transformation : 1 query → N query variations → fused results
When to Use
Use Case Description Improved recall Capture documents that match alternative phrasings Ambiguous queries Handle queries with multiple interpretations Synonym expansion Find documents using different terminology Multi-aspect search Break complex queries into sub-queries
When NOT to Use
Scenario Recommended Alternative Simple keyword search semantic_search directlyLow latency requirements Pre-compute expansions Precise single-intent queries Standard search Cost-sensitive applications Use simpler search
Parameters
Parameter Type Default Description modelstring Required LLM model for query generation querystring {{INPUT.query}}Original query to expand num_variationsinteger 3Number of query variations to generate vector_indexstring Required Vector index for searches top_kinteger 20Results per query variation fusion_methodstring rrfResult fusion method: rrf, linear, max expansion_promptstring auto Custom prompt for query generation
Fusion Methods
Method Description Best For rrfReciprocal Rank Fusion General purpose, balanced linearWeighted score combination When scores are comparable maxTake maximum score When any match is good
Configuration Examples
Basic Query Expansion
Custom Expansion Prompt
High-Recall Configuration
Domain-Specific Expansion
{
"stage_type" : "filter" ,
"stage_id" : "query_expand" ,
"parameters" : {
"model" : "gpt-4o-mini" ,
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"num_variations" : 3 ,
"top_k" : 30
}
}
How Query Expansion Works
Original Query : “how to fix memory leaks”
LLM Generates Variations :
“memory leak detection and resolution”
“debugging memory issues in applications”
“preventing memory leaks in code”
Execute Searches : Run vector search for each variation
Fuse Results : Combine using RRF or other fusion method
Return : Deduplicated, ranked result set
Reciprocal Rank Fusion (RRF)
RRF combines results from multiple queries using the formula:
score(doc) = Σ 1 / (k + rank_i)
Where k is typically 60, and rank_i is the document’s rank in query i’s results.
Advantage Description Score-agnostic Works with different scoring scales Rank-based Focuses on relative ordering Self-balancing No manual weight tuning
Output Schema
Each document includes fusion metadata:
{
"document_id" : "doc_123" ,
"content" : "Document content..." ,
"score" : 0.87 ,
"query_expand" : {
"matched_variations" : [ "memory leak detection" , "debugging memory issues" ],
"fusion_score" : 0.87 ,
"individual_ranks" : [ 2 , 5 , null ]
}
}
Metric Value Latency 300-800ms (LLM + searches) LLM calls 1 per execution Search calls N (num_variations) Token usage ~50-100 tokens
Query expansion adds latency due to LLM generation and multiple searches. Use judiciously for queries where recall improvement justifies the cost.
Common Pipeline Patterns
Expanded Search + Rerank
[
{
"stage_type" : "filter" ,
"stage_id" : "query_expand" ,
"parameters" : {
"model" : "gpt-4o-mini" ,
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"num_variations" : 3 ,
"top_k" : 50
}
},
{
"stage_type" : "sort" ,
"stage_id" : "rerank" ,
"parameters" : {
"model" : "bge-reranker-v2-m3" ,
"top_n" : 10
}
}
]
Expansion + Filter + Summarize
[
{
"stage_type" : "filter" ,
"stage_id" : "query_expand" ,
"parameters" : {
"model" : "gpt-4o-mini" ,
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"num_variations" : 4 ,
"top_k" : 30
}
},
{
"stage_type" : "filter" ,
"stage_id" : "structured_filter" ,
"parameters" : {
"conditions" : {
"field" : "metadata.verified" ,
"operator" : "eq" ,
"value" : true
}
}
},
{
"stage_type" : "reduce" ,
"stage_id" : "summarize" ,
"parameters" : {
"model" : "gpt-4o" ,
"prompt" : "Answer based on the documents: {{INPUT.query}}"
}
}
]
Cost Optimization
Strategy Impact Reduce num_variations Fewer searches Use cheaper LLM gpt-4o-mini vs gpt-4oLower top_k per variation Less fusion overhead Cache common expansions Reduce LLM calls
Error Handling
Error Behavior LLM failure Fall back to original query Search failure Skip that variation Empty expansions Use original query only Timeout Return partial results