Skip to main content
JSON Transform stage showing Jinja2 template transformation of documents
The JSON Transform stage applies a Jinja2 template to each document, rendering the template with full document context and replacing the document with the parsed JSON output. Use this to reformat documents for external APIs or reshape data for downstream consumers.
Stage Category: APPLY (1-1 Transformation)Transformation: N documents → N documents (or fewer with fail_on_error=False)

When to Use

Use CaseDescription
External API formattingFormat documents for webhook payloads
Response optimizationRemove unused fields to reduce bandwidth
Schema adaptationConvert internal format to client-specific format
Conditional outputsInclude fields based on document properties
Array flatteningTransform nested structures to flat arrays
Field renamingRename or reorganize document fields

When NOT to Use

ScenarioRecommended Alternative
Filtering documentsstructured_filter or llm_filter
Sorting documentssort_by_field or rerank
Enriching with new datadocument_enrich or api_call
Joining external datataxonomy_enrich

Parameters

ParameterTypeDefaultDescription
templatestringRequiredJinja2 template that must render to valid JSON
fail_on_errorbooleanfalseFail entire pipeline on transformation error

Template Context

Templates have access to the full retriever execution context:
NamespaceDescriptionExample
DOC / docCurrent document fields and metadata{{ DOC.document_id }}
INPUT / inputsOriginal query inputs from search request{{ INPUT.query }}
CONTEXT / contextExecution context (namespace_id, etc.){{ CONTEXT.namespace_id }}
STAGE / stageCurrent stage execution data{{ STAGE.name }}
Both uppercase and lowercase namespace formats work identically (DOC == doc).

Template Features

Jinja2 Syntax

FeatureSyntaxDescription
Variables{{ DOC.field }}Output field values
Conditionals{% if %}...{% endif %}Conditional content
Loops{% for item in items %}Iterate over arrays
Filters{{ value | tojson }}Transform values
Comments{# comment #}Template comments

Useful Filters

FilterDescriptionExample
tojsonJSON-safe encoding{{ DOC.data | tojson }}
lengthGet array/string length{{ DOC.tags | length }}
defaultFallback value{{ DOC.optional | default('N/A') }}
first / lastArray element{{ DOC.items | first }}
joinJoin array{{ DOC.tags | join(', ') }}

Configuration Examples

{
  "stage_type": "apply",
  "stage_id": "json_transform",
  "parameters": {
    "template": "{\"id\": \"{{ DOC.document_id }}\", \"content\": \"{{ DOC.text }}\", \"score\": {{ DOC.score }}}"
  }
}

Error Handling

SettingBehavior
fail_on_error: false (default)Skip failed documents with warning, continue processing
fail_on_error: trueFail entire retrieval on first transformation error
Common failure causes:
  • Invalid template syntax
  • Template rendering errors (missing fields)
  • Invalid JSON output from template
  • Document missing required fields
Use fail_on_error: false for public APIs where partial results are acceptable. Use fail_on_error: true for internal workflows where data integrity is critical.

Performance

MetricValue
Latency< 1ms per document
ProcessingSequential (fast, no caching needed)
SchemaOutput completely defined by template

Multi-line Templates

For complex templates, use HEREDOC syntax in the API call:
curl -X POST "$MP_API_URL/v1/retrievers" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -d '{
    "stages": [{
      "stage_type": "apply",
      "stage_id": "json_transform",
      "parameters": {
        "template": "{\n  \"id\": \"{{ DOC.document_id }}\",\n  \"title\": {{ DOC.title | tojson }},\n  \"items\": [\n    {% for item in DOC.items %}{\n      \"name\": \"{{ item.name }}\",\n      \"value\": {{ item.value }}\n    }{% if not loop.last %},{% endif %}\n    {% endfor %}\n  ]\n}"
      }
    }]
  }'

Common Patterns

Drop Unused Fields

{
  "template": "{\"id\": \"{{ DOC.document_id }}\", \"title\": \"{{ DOC.title }}\", \"url\": \"{{ DOC.url }}\"}"
}

Flatten Nested Metadata

{
  "template": "{\"doc_id\": \"{{ DOC.document_id }}\", \"user_id\": \"{{ DOC.metadata.user_id }}\", \"category\": \"{{ DOC.metadata.category }}\", \"score\": {{ DOC.score }}}"
}

Add Query Context

{
  "template": "{\"query\": \"{{ INPUT.query }}\", \"result_id\": \"{{ DOC.document_id }}\", \"score\": {{ DOC.score }}}"
}