Rerank

Rerank stage showing cross-encoder model re-scoring search results

The Rerank stage uses cross-encoder models to re-score and reorder search results. Unlike bi-encoder models (used in semantic search), cross-encoders process the query and document together, enabling more accurate relevance scoring at the cost of higher latency.

Stage Category: SORT (Reorders documents)Transformation: N documents → top_n documents (re-ranked by relevance)

When to Use

Use Case	Description
Two-stage retrieval	Fast recall (search) + precise ranking (rerank)
High-precision requirements	When ranking quality is critical
Top-N optimization	Improve quality of final displayed results
RAG applications	Better context selection for LLM generation

When NOT to Use

Scenario	Recommended Alternative
Large result sets (1000+)	Too slow; use `sort_by_field`
Real-time requirements (< 20ms)	Use search scores directly
Simple attribute sorting	`sort_by_field`

Parameters

Parameter	Type	Default	Description
`model`	string	Required	Reranker model to use
`top_n`	integer	`10`	Number of results to return after reranking
`query`	string	`{{INPUT.query}}`	Query for relevance scoring

Available Models

Model	Speed	Quality	Best For
`bge-reranker-v2-m3`	Fast	High	General purpose, multilingual
`cohere-rerank-v3`	Medium	Highest	Maximum accuracy
`jina-reranker-v2`	Fast	High	Multilingual, long documents

Configuration Examples

{
  "stage_type": "sort",
  "stage_id": "rerank",
  "parameters": {
    "model": "bge-reranker-v2-m3",
    "top_n": 10
  }
}

How Cross-Encoders Work

Bi-Encoder (Search)	Cross-Encoder (Rerank)
Query and doc encoded separately	Query + doc encoded together
Pre-compute doc embeddings	Must process each pair
Fast (< 10ms for millions)	Slower (50-100ms for 100 docs)
Good approximate ranking	Precise relevance scoring

Cross-encoders see the full context of both query and document together, enabling better understanding of semantic relationships.

Two-Stage Retrieval Pattern

The recommended pattern is fast recall followed by precise reranking:

[
  {
    "stage_type": "filter",
    "stage_id": "semantic_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 100
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "rerank",
    "parameters": {
      "model": "bge-reranker-v2-m3",
      "top_n": 10
    }
  }
]

Why this works:

Search stage: Fast, retrieves 100 candidates (< 20ms)
Rerank stage: Slower but precise, picks best 10 (50-100ms)
Total: High-quality results in 70-120ms

Performance

Metric	Value
Latency	50-100ms (depends on candidate count)
Optimal input size	50-200 documents
Maximum practical	~500 documents
Batching	Automatic

Reranking 1000+ documents is not recommended. Use top_k limits in the search stage to control candidate pool size.

Output

Each returned document includes:

Field	Type	Description
`document_id`	string	Unique document identifier
`score`	float	Reranker relevance score
`original_score`	float	Score from previous stage
`rerank_position`	integer	Position after reranking

Common Pipeline Patterns

Search + Rerank + Limit

[
  {
    "stage_type": "filter",
    "stage_id": "hybrid_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 100
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "rerank",
    "parameters": {
      "model": "bge-reranker-v2-m3",
      "top_n": 20
    }
  },
  {
    "stage_type": "reduce",
    "stage_id": "limit",
    "parameters": {
      "limit": 5
    }
  }
]

Search + Filter + Rerank

[
  {
    "stage_type": "filter",
    "stage_id": "semantic_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 200
    }
  },
  {
    "stage_type": "filter",
    "stage_id": "structured_filter",
    "parameters": {
      "conditions": {
        "field": "metadata.category",
        "operator": "eq",
        "value": "{{INPUT.category}}"
      }
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "rerank",
    "parameters": {
      "model": "bge-reranker-v2-m3",
      "top_n": 10
    }
  }
]

Trade-offs

Aspect	Impact
Higher precision	Better relevance scoring
Higher latency	50-100ms per batch
Limited scale	Best for < 500 candidates
API costs	Per-document scoring

Semantic Search - Initial candidate retrieval
Hybrid Search - Combined vector + text search
Sort By Field - Simple attribute sorting

Getting Started

Ingest Data

Process Data

Search & Retrieve

Relevance & Personalization

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

When to Use

When NOT to Use

Parameters

Available Models

Configuration Examples

How Cross-Encoders Work

Two-Stage Retrieval Pattern

Performance

Output

Common Pipeline Patterns

Search + Rerank + Limit

Search + Filter + Rerank

Trade-offs

Getting Started

Ingest Data

Process Data

Search & Retrieve

Relevance & Personalization

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

​When to Use

​When NOT to Use

​Parameters

​Available Models

​Configuration Examples

​How Cross-Encoders Work

​Two-Stage Retrieval Pattern

​Performance

​Output

​Common Pipeline Patterns

​Search + Rerank + Limit

​Search + Filter + Rerank

​Trade-offs

​Related

When to Use

When NOT to Use

Parameters

Available Models

Configuration Examples

How Cross-Encoders Work

Two-Stage Retrieval Pattern

Performance

Output

Common Pipeline Patterns

Search + Rerank + Limit

Search + Filter + Rerank

Trade-offs

Related