Skip to main content
RAG Prepare stage showing document formatting for LLM context
The RAG Prepare stage formats search results for LLM consumption by managing token budgets, formatting documents, and adding citations. This is a preparation stage that does NOT call an LLM - it prepares content for downstream LLM stages or external LLM calls.
Stage Category: APPLYTransformation:
  • single_context mode: N documents → 1 combined context document
  • formatted_list mode: N documents → N formatted documents

When to Use

Use CaseDescription
Before LLM generationPrepare context for summarization or Q&A
Token budget managementFit multiple docs into context window
Citation trackingEnable source attribution in responses
Consistent formattingStandardize document format for LLM input

When NOT to Use

ScenarioRecommended Alternative
Want LLM to generate summarysummarize stage (calls LLM)
Don’t need token managementPass documents directly
Simple pass-throughSkip this stage

Parameters

ParameterTypeDefaultDescription
max_tokensinteger8000Maximum tokens for combined output
tokenizerstringcl100k_baseTokenizer to use (GPT-4 compatible)
truncation_strategystringpriority_truncateHow to handle token overflow
output_modestringsingle_contextOutput format
document_templatestring[{{CONTEXT.INDEX}}] {{DOC.content}}\n\nTemplate for each document
content_fieldstringcontentField to extract content from
separatorstring\nSeparator between documents
citationobject{style: "numbered"}Citation configuration

Truncation Strategies

StrategyBehavior
priority_truncateInclude docs in score order, truncate last to fit
proportionalGive each doc proportional token budget
drop_lastInclude complete docs until limit, drop remaining

Output Modes

ModeOutputUse Case
single_context1 document with combined context stringDirect LLM input
formatted_listN documents with formatted_content fieldCustom processing

Configuration Examples

{
  "stage_type": "apply",
  "stage_id": "rag_prepare",
  "parameters": {
    "max_tokens": 8000,
    "output_mode": "single_context"
  }
}

Template Placeholders

PlaceholderDescription
{{CONTEXT.INDEX}}1-based position in result set (1, 2, 3…)
{{CONTEXT.CITATION}}Citation marker based on citation.style
{{DOC.*}}Any document field (e.g., {{DOC.content}}, {{DOC.metadata.title}})

Citation Styles

StyleOutputExample
numbered[1], [2], [3]Default, clean
bracketed[doc_id]Document ID references
footnoteSuperscript numbersAcademic style
noneNo citationsWhen not needed

Output Schema

single_context Mode

{
  "rag_context": "[1] First document content...\n\n[2] Second document content...",
  "citations": [
    {"index": 1, "title": "Document Title", "document_id": "doc_123"},
    {"index": 2, "title": "Another Title", "document_id": "doc_456"}
  ]
}

formatted_list Mode

Each document gets:
{
  "document_id": "doc_123",
  "formatted_content": "[1] Title\nContent here...",
  "original_content": "Content here...",
  "metadata": {...}
}

Performance

MetricValue
Latency< 10ms
Token countingUses tiktoken (accurate)
No LLM callsPure formatting
This stage does NOT call an LLM. It only formats content for LLM consumption. Use the summarize stage if you want LLM-generated summaries.

Common Pipeline Patterns

Search + Prepare + External LLM

[
  {
    "stage_type": "filter",
    "stage_id": "semantic_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 50
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "rerank",
    "parameters": {
      "model": "bge-reranker-v2-m3",
      "top_n": 10
    }
  },
  {
    "stage_type": "apply",
    "stage_id": "rag_prepare",
    "parameters": {
      "max_tokens": 8000,
      "output_mode": "single_context",
      "citation": {"style": "numbered"}
    }
  }
]
The output rag_context can then be passed to an external LLM call.

vs Summarize Stage

Featurerag_preparesummarize
Calls LLMNoYes
OutputFormatted contextGenerated summary
Latency< 10ms500-2000ms
CostFreeLLM API costs
Use casePrepare for external LLMEnd-to-end RAG