The RAG Prepare stage formats search results for LLM consumption by managing token budgets, formatting documents, and adding citations. This is a preparation stage that does NOT call an LLM - it prepares content for downstream LLM stages or external LLM calls.
Stage Category : APPLYTransformation :
single_context mode: N documents → 1 combined context document
formatted_list mode: N documents → N formatted documents
When to Use
Use Case Description Before LLM generation Prepare context for summarization or Q&A Token budget management Fit multiple docs into context window Citation tracking Enable source attribution in responses Consistent formatting Standardize document format for LLM input
When NOT to Use
Scenario Recommended Alternative Want LLM to generate summary summarize stage (calls LLM)Don’t need token management Pass documents directly Simple pass-through Skip this stage
Parameters
Parameter Type Default Description max_tokensinteger 8000Maximum tokens for combined output tokenizerstring cl100k_baseTokenizer to use (GPT-4 compatible) truncation_strategystring priority_truncateHow to handle token overflow output_modestring single_contextOutput format document_templatestring [{{CONTEXT.INDEX}}] {{DOC.content}}\n\nTemplate for each document content_fieldstring contentField to extract content from separatorstring \nSeparator between documents citationobject {style: "numbered"}Citation configuration
Truncation Strategies
Strategy Behavior priority_truncateInclude docs in score order, truncate last to fit proportionalGive each doc proportional token budget drop_lastInclude complete docs until limit, drop remaining
Output Modes
Mode Output Use Case single_context1 document with combined context string Direct LLM input formatted_listN documents with formatted_content field Custom processing
Configuration Examples
Basic RAG Context
With Numbered Citations
Large Context Window (GPT-4 Turbo)
Formatted List Mode
Custom Template with URL
{
"stage_type" : "apply" ,
"stage_id" : "rag_prepare" ,
"parameters" : {
"max_tokens" : 8000 ,
"output_mode" : "single_context"
}
}
Template Placeholders
Placeholder Description {{CONTEXT.INDEX}}1-based position in result set (1, 2, 3…) {{CONTEXT.CITATION}}Citation marker based on citation.style {{DOC.*}}Any document field (e.g., {{DOC.content}}, {{DOC.metadata.title}})
Citation Styles
Style Output Example numbered[1], [2], [3]Default, clean bracketed[doc_id]Document ID references footnoteSuperscript numbers Academic style noneNo citations When not needed
Output Schema
single_context Mode
{
"rag_context" : "[1] First document content... \n\n [2] Second document content..." ,
"citations" : [
{ "index" : 1 , "title" : "Document Title" , "document_id" : "doc_123" },
{ "index" : 2 , "title" : "Another Title" , "document_id" : "doc_456" }
]
}
Each document gets:
{
"document_id" : "doc_123" ,
"formatted_content" : "[1] Title \n Content here..." ,
"original_content" : "Content here..." ,
"metadata" : { ... }
}
Metric Value Latency < 10ms Token counting Uses tiktoken (accurate) No LLM calls Pure formatting
This stage does NOT call an LLM. It only formats content for LLM consumption. Use the summarize stage if you want LLM-generated summaries.
Common Pipeline Patterns
Search + Prepare + External LLM
[
{
"stage_type" : "filter" ,
"stage_id" : "semantic_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 50
}
},
{
"stage_type" : "sort" ,
"stage_id" : "rerank" ,
"parameters" : {
"model" : "bge-reranker-v2-m3" ,
"top_n" : 10
}
},
{
"stage_type" : "apply" ,
"stage_id" : "rag_prepare" ,
"parameters" : {
"max_tokens" : 8000 ,
"output_mode" : "single_context" ,
"citation" : { "style" : "numbered" }
}
}
]
The output rag_context can then be passed to an external LLM call.
vs Summarize Stage
Feature rag_prepare summarize Calls LLM No Yes Output Formatted context Generated summary Latency < 10ms 500-2000ms Cost Free LLM API costs Use case Prepare for external LLM End-to-end RAG