The Rerank stage uses cross-encoder models to re-score and reorder search results. Unlike bi-encoder models (used in semantic search), cross-encoders process the query and document together, enabling more accurate relevance scoring at the cost of higher latency.
Stage Category : SORT (Reorders documents)Transformation : N documents → top_n documents (re-ranked by relevance)
When to Use
Use Case Description Two-stage retrieval Fast recall (search) + precise ranking (rerank) High-precision requirements When ranking quality is critical Top-N optimization Improve quality of final displayed results RAG applications Better context selection for LLM generation
When NOT to Use
Scenario Recommended Alternative Large result sets (1000+) Too slow; use sort_by_field Real-time requirements (< 20ms) Use search scores directly Simple attribute sorting sort_by_field
Parameters
Parameter Type Default Description modelstring Required Reranker model to use top_ninteger 10Number of results to return after reranking querystring {{INPUT.query}}Query for relevance scoring
Available Models
Model Speed Quality Best For bge-reranker-v2-m3Fast High General purpose, multilingual cohere-rerank-v3Medium Highest Maximum accuracy jina-reranker-v2Fast High Multilingual, long documents
Configuration Examples
Basic Reranking
High-Quality Reranking
Custom Query
Large Candidate Pool
{
"stage_type" : "sort" ,
"stage_id" : "rerank" ,
"parameters" : {
"model" : "bge-reranker-v2-m3" ,
"top_n" : 10
}
}
How Cross-Encoders Work
Bi-Encoder (Search) Cross-Encoder (Rerank) Query and doc encoded separately Query + doc encoded together Pre-compute doc embeddings Must process each pair Fast (< 10ms for millions) Slower (50-100ms for 100 docs) Good approximate ranking Precise relevance scoring
Cross-encoders see the full context of both query and document together, enabling better understanding of semantic relationships.
Two-Stage Retrieval Pattern
The recommended pattern is fast recall followed by precise reranking:
[
{
"stage_type" : "filter" ,
"stage_id" : "semantic_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 100
}
},
{
"stage_type" : "sort" ,
"stage_id" : "rerank" ,
"parameters" : {
"model" : "bge-reranker-v2-m3" ,
"top_n" : 10
}
}
]
Why this works:
Search stage : Fast, retrieves 100 candidates (< 20ms)
Rerank stage : Slower but precise, picks best 10 (50-100ms)
Total : High-quality results in 70-120ms
Metric Value Latency 50-100ms (depends on candidate count) Optimal input size 50-200 documents Maximum practical ~500 documents Batching Automatic
Reranking 1000+ documents is not recommended. Use top_k limits in the search stage to control candidate pool size.
Output
Each returned document includes:
Field Type Description document_idstring Unique document identifier scorefloat Reranker relevance score original_scorefloat Score from previous stage rerank_positioninteger Position after reranking
Common Pipeline Patterns
Search + Rerank + Limit
[
{
"stage_type" : "filter" ,
"stage_id" : "hybrid_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 100
}
},
{
"stage_type" : "sort" ,
"stage_id" : "rerank" ,
"parameters" : {
"model" : "bge-reranker-v2-m3" ,
"top_n" : 20
}
},
{
"stage_type" : "reduce" ,
"stage_id" : "limit" ,
"parameters" : {
"limit" : 5
}
}
]
Search + Filter + Rerank
[
{
"stage_type" : "filter" ,
"stage_id" : "semantic_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 200
}
},
{
"stage_type" : "filter" ,
"stage_id" : "structured_filter" ,
"parameters" : {
"conditions" : {
"field" : "metadata.category" ,
"operator" : "eq" ,
"value" : "{{INPUT.category}}"
}
}
},
{
"stage_type" : "sort" ,
"stage_id" : "rerank" ,
"parameters" : {
"model" : "bge-reranker-v2-m3" ,
"top_n" : 10
}
}
]
Trade-offs
Aspect Impact Higher precision Better relevance scoring Higher latency 50-100ms per batch Limited scale Best for < 500 candidates API costs Per-document scoring