Skip to main content
Rerank stage showing cross-encoder model re-scoring search results
The Rerank stage uses cross-encoder models to re-score and reorder search results. Unlike bi-encoder models (used in semantic search), cross-encoders process the query and document together, enabling more accurate relevance scoring at the cost of higher latency.
Stage Category: SORT (Reorders documents)Transformation: N documents → top_n documents (re-ranked by relevance)

When to Use

Use CaseDescription
Two-stage retrievalFast recall (search) + precise ranking (rerank)
High-precision requirementsWhen ranking quality is critical
Top-N optimizationImprove quality of final displayed results
RAG applicationsBetter context selection for LLM generation

When NOT to Use

ScenarioRecommended Alternative
Large result sets (1000+)Too slow; use sort_by_field
Real-time requirements (< 20ms)Use search scores directly
Simple attribute sortingsort_by_field

Parameters

ParameterTypeDefaultDescription
modelstringRequiredReranker model to use
top_ninteger10Number of results to return after reranking
querystring{{INPUT.query}}Query for relevance scoring

Available Models

ModelSpeedQualityBest For
bge-reranker-v2-m3FastHighGeneral purpose, multilingual
cohere-rerank-v3MediumHighestMaximum accuracy
jina-reranker-v2FastHighMultilingual, long documents

Configuration Examples

{
  "stage_type": "sort",
  "stage_id": "rerank",
  "parameters": {
    "model": "bge-reranker-v2-m3",
    "top_n": 10
  }
}

How Cross-Encoders Work

Bi-Encoder (Search)Cross-Encoder (Rerank)
Query and doc encoded separatelyQuery + doc encoded together
Pre-compute doc embeddingsMust process each pair
Fast (< 10ms for millions)Slower (50-100ms for 100 docs)
Good approximate rankingPrecise relevance scoring
Cross-encoders see the full context of both query and document together, enabling better understanding of semantic relationships.

Two-Stage Retrieval Pattern

The recommended pattern is fast recall followed by precise reranking:
[
  {
    "stage_type": "filter",
    "stage_id": "semantic_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 100
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "rerank",
    "parameters": {
      "model": "bge-reranker-v2-m3",
      "top_n": 10
    }
  }
]
Why this works:
  1. Search stage: Fast, retrieves 100 candidates (< 20ms)
  2. Rerank stage: Slower but precise, picks best 10 (50-100ms)
  3. Total: High-quality results in 70-120ms

Performance

MetricValue
Latency50-100ms (depends on candidate count)
Optimal input size50-200 documents
Maximum practical~500 documents
BatchingAutomatic
Reranking 1000+ documents is not recommended. Use top_k limits in the search stage to control candidate pool size.

Output

Each returned document includes:
FieldTypeDescription
document_idstringUnique document identifier
scorefloatReranker relevance score
original_scorefloatScore from previous stage
rerank_positionintegerPosition after reranking

Common Pipeline Patterns

Search + Rerank + Limit

[
  {
    "stage_type": "filter",
    "stage_id": "hybrid_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 100
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "rerank",
    "parameters": {
      "model": "bge-reranker-v2-m3",
      "top_n": 20
    }
  },
  {
    "stage_type": "reduce",
    "stage_id": "limit",
    "parameters": {
      "limit": 5
    }
  }
]

Search + Filter + Rerank

[
  {
    "stage_type": "filter",
    "stage_id": "semantic_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 200
    }
  },
  {
    "stage_type": "filter",
    "stage_id": "structured_filter",
    "parameters": {
      "conditions": {
        "field": "metadata.category",
        "operator": "eq",
        "value": "{{INPUT.category}}"
      }
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "rerank",
    "parameters": {
      "model": "bge-reranker-v2-m3",
      "top_n": 10
    }
  }
]

Trade-offs

AspectImpact
Higher precisionBetter relevance scoring
Higher latency50-100ms per batch
Limited scaleBest for < 500 candidates
API costsPer-document scoring