The Summarize stage uses language models to generate summaries from document sets. It can create single summaries from multiple documents, per-document summaries, or answer questions based on the retrieved content.
Stage Category : REDUCE (Aggregates documents)Transformation : N documents → 1 summary document (or N documents with summaries)
When to Use
Use Case Description RAG summarization Generate answers from search results Document synthesis Combine multiple sources into one summary Key points extraction Distill long documents to essentials Question answering Answer user questions from retrieved docs
When NOT to Use
Scenario Recommended Alternative Just formatting for LLM rag_prepare (no LLM call)Extracting structured data llm_enrichmentReal-time low-latency Pre-compute summaries
Parameters
Parameter Type Default Description modelstring Required LLM model to use promptstring Required Summarization instructions content_fieldstring contentField containing text to summarize modestring aggregateaggregate (all→1) or per_documentmax_input_tokensinteger 8000Max tokens to send to LLM include_citationsboolean trueAdd source citations to summary output_fieldstring summaryField for summary output
Available Models
Model Speed Quality Context Cost gpt-4o-miniFast Good 128K Low gpt-4oMedium Excellent 128K Medium claude-3-haikuFast Good 200K Low claude-3-sonnetMedium Excellent 200K Medium claude-3-opusSlow Best 200K High
Configuration Examples
Basic RAG Summary
Detailed Summary
Per-Document Summaries
Executive Brief
Q&A with Sources
{
"stage_type" : "reduce" ,
"stage_id" : "summarize" ,
"parameters" : {
"model" : "gpt-4o-mini" ,
"prompt" : "Based on the provided documents, answer the user's question: {{INPUT.query}}" ,
"mode" : "aggregate" ,
"include_citations" : true
}
}
Modes
Aggregate Mode (default)
Combines all documents into a single summary:
[Doc1, Doc2, Doc3] → "Combined summary of all documents..."
Per-Document Mode
Creates a summary for each document:
[Doc1, Doc2, Doc3] → [Doc1 + summary, Doc2 + summary, Doc3 + summary]
Output Schema
Aggregate Mode
{
"summary" : "Based on the documents, the answer is... \n\n [1] First source mentioned... \n [2] Second source confirmed..." ,
"citations" : [
{ "index" : 1 , "document_id" : "doc_123" , "title" : "Source Document 1" },
{ "index" : 2 , "document_id" : "doc_456" , "title" : "Source Document 2" }
],
"model" : "gpt-4o-mini" ,
"tokens_used" : 1250
}
Per-Document Mode
Each document includes:
{
"document_id" : "doc_123" ,
"content" : "Original content..." ,
"document_summary" : "This document discusses..." ,
"metadata" : { ... }
}
Metric Value Latency 500-2000ms Token usage Depends on input size Max input Model context window Streaming Supported
Summarization calls the LLM and incurs API costs. Use rag_prepare if you only need to format content for external LLM calls.
Common Pipeline Patterns
Full RAG Pipeline
[
{
"stage_type" : "filter" ,
"stage_id" : "hybrid_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 50
}
},
{
"stage_type" : "sort" ,
"stage_id" : "rerank" ,
"parameters" : {
"model" : "bge-reranker-v2-m3" ,
"top_n" : 10
}
},
{
"stage_type" : "reduce" ,
"stage_id" : "summarize" ,
"parameters" : {
"model" : "gpt-4o" ,
"prompt" : "Answer the user's question based on the provided documents: {{INPUT.query}}" ,
"include_citations" : true
}
}
]
Multi-Document Synthesis
[
{
"stage_type" : "filter" ,
"stage_id" : "semantic_search" ,
"parameters" : {
"query" : "{{INPUT.topic}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 20
}
},
{
"stage_type" : "filter" ,
"stage_id" : "structured_filter" ,
"parameters" : {
"conditions" : {
"field" : "metadata.type" ,
"operator" : "eq" ,
"value" : "research_paper"
}
}
},
{
"stage_type" : "reduce" ,
"stage_id" : "summarize" ,
"parameters" : {
"model" : "claude-3-sonnet" ,
"prompt" : "Synthesize the research findings from these papers on {{INPUT.topic}}. Identify common themes, contradictions, and gaps in the research." ,
"max_input_tokens" : 32000
}
}
]
Preview Summaries
[
{
"stage_type" : "filter" ,
"stage_id" : "semantic_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 10
}
},
{
"stage_type" : "reduce" ,
"stage_id" : "summarize" ,
"parameters" : {
"model" : "gpt-4o-mini" ,
"prompt" : "Create a one-sentence summary of this document." ,
"mode" : "per_document" ,
"output_field" : "preview"
}
}
]
Comparison: summarize vs rag_prepare
Feature summarize rag_prepare Calls LLM Yes No Output Generated summary Formatted context Latency 500-2000ms < 10ms Cost LLM API costs Free Use case End-to-end RAG Prepare for external LLM
Error Handling
Error Behavior Token limit exceeded Truncates input, continues LLM timeout Retry once, then fail Rate limit Automatic backoff Empty input Returns empty summary