Skip to main content
Summarize stage showing LLM-powered document summarization
The Summarize stage uses language models to generate summaries from document sets. It can create single summaries from multiple documents, per-document summaries, or answer questions based on the retrieved content.
Stage Category: REDUCE (Aggregates documents)Transformation: N documents → 1 summary document (or N documents with summaries)

When to Use

Use CaseDescription
RAG summarizationGenerate answers from search results
Document synthesisCombine multiple sources into one summary
Key points extractionDistill long documents to essentials
Question answeringAnswer user questions from retrieved docs

When NOT to Use

ScenarioRecommended Alternative
Just formatting for LLMrag_prepare (no LLM call)
Extracting structured datallm_enrichment
Real-time low-latencyPre-compute summaries

Parameters

ParameterTypeDefaultDescription
modelstringRequiredLLM model to use
promptstringRequiredSummarization instructions
content_fieldstringcontentField containing text to summarize
modestringaggregateaggregate (all→1) or per_document
max_input_tokensinteger8000Max tokens to send to LLM
include_citationsbooleantrueAdd source citations to summary
output_fieldstringsummaryField for summary output

Available Models

ModelSpeedQualityContextCost
gpt-4o-miniFastGood128KLow
gpt-4oMediumExcellent128KMedium
claude-3-haikuFastGood200KLow
claude-3-sonnetMediumExcellent200KMedium
claude-3-opusSlowBest200KHigh

Configuration Examples

{
  "stage_type": "reduce",
  "stage_id": "summarize",
  "parameters": {
    "model": "gpt-4o-mini",
    "prompt": "Based on the provided documents, answer the user's question: {{INPUT.query}}",
    "mode": "aggregate",
    "include_citations": true
  }
}

Modes

Aggregate Mode (default)

Combines all documents into a single summary:
[Doc1, Doc2, Doc3] → "Combined summary of all documents..."

Per-Document Mode

Creates a summary for each document:
[Doc1, Doc2, Doc3] → [Doc1 + summary, Doc2 + summary, Doc3 + summary]

Output Schema

Aggregate Mode

{
  "summary": "Based on the documents, the answer is...\n\n[1] First source mentioned...\n[2] Second source confirmed...",
  "citations": [
    {"index": 1, "document_id": "doc_123", "title": "Source Document 1"},
    {"index": 2, "document_id": "doc_456", "title": "Source Document 2"}
  ],
  "model": "gpt-4o-mini",
  "tokens_used": 1250
}

Per-Document Mode

Each document includes:
{
  "document_id": "doc_123",
  "content": "Original content...",
  "document_summary": "This document discusses...",
  "metadata": {...}
}

Performance

MetricValue
Latency500-2000ms
Token usageDepends on input size
Max inputModel context window
StreamingSupported
Summarization calls the LLM and incurs API costs. Use rag_prepare if you only need to format content for external LLM calls.

Common Pipeline Patterns

Full RAG Pipeline

[
  {
    "stage_type": "filter",
    "stage_id": "hybrid_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 50
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "rerank",
    "parameters": {
      "model": "bge-reranker-v2-m3",
      "top_n": 10
    }
  },
  {
    "stage_type": "reduce",
    "stage_id": "summarize",
    "parameters": {
      "model": "gpt-4o",
      "prompt": "Answer the user's question based on the provided documents: {{INPUT.query}}",
      "include_citations": true
    }
  }
]

Multi-Document Synthesis

[
  {
    "stage_type": "filter",
    "stage_id": "semantic_search",
    "parameters": {
      "query": "{{INPUT.topic}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 20
    }
  },
  {
    "stage_type": "filter",
    "stage_id": "structured_filter",
    "parameters": {
      "conditions": {
        "field": "metadata.type",
        "operator": "eq",
        "value": "research_paper"
      }
    }
  },
  {
    "stage_type": "reduce",
    "stage_id": "summarize",
    "parameters": {
      "model": "claude-3-sonnet",
      "prompt": "Synthesize the research findings from these papers on {{INPUT.topic}}. Identify common themes, contradictions, and gaps in the research.",
      "max_input_tokens": 32000
    }
  }
]

Preview Summaries

[
  {
    "stage_type": "filter",
    "stage_id": "semantic_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 10
    }
  },
  {
    "stage_type": "reduce",
    "stage_id": "summarize",
    "parameters": {
      "model": "gpt-4o-mini",
      "prompt": "Create a one-sentence summary of this document.",
      "mode": "per_document",
      "output_field": "preview"
    }
  }
]

Comparison: summarize vs rag_prepare

Featuresummarizerag_prepare
Calls LLMYesNo
OutputGenerated summaryFormatted context
Latency500-2000ms< 10ms
CostLLM API costsFree
Use caseEnd-to-end RAGPrepare for external LLM

Error Handling

ErrorBehavior
Token limit exceededTruncates input, continues
LLM timeoutRetry once, then fail
Rate limitAutomatic backoff
Empty inputReturns empty summary