The JSON Transform stage applies a Jinja2 template to each document, rendering the template with full document context and replacing the document with the parsed JSON output. Use this to reformat documents for external APIs or reshape data for downstream consumers.
Stage Category : APPLY (1-1 Transformation)Transformation : N documents → N documents (or fewer with fail_on_error=False)
When to Use
Use Case Description External API formatting Format documents for webhook payloads Response optimization Remove unused fields to reduce bandwidth Schema adaptation Convert internal format to client-specific format Conditional outputs Include fields based on document properties Array flattening Transform nested structures to flat arrays Field renaming Rename or reorganize document fields
When NOT to Use
Scenario Recommended Alternative Filtering documents structured_filter or llm_filterSorting documents sort_by_field or rerankEnriching with new data document_enrich or api_callJoining external data taxonomy_enrich
Parameters
Parameter Type Default Description templatestring Required Jinja2 template that must render to valid JSON fail_on_errorboolean falseFail entire pipeline on transformation error
Template Context
Templates have access to the full retriever execution context:
Namespace Description Example DOC / docCurrent document fields and metadata {{ DOC.document_id }}INPUT / inputsOriginal query inputs from search request {{ INPUT.query }}CONTEXT / contextExecution context (namespace_id, etc.) {{ CONTEXT.namespace_id }}STAGE / stageCurrent stage execution data {{ STAGE.name }}
Both uppercase and lowercase namespace formats work identically (DOC == doc).
Template Features
Jinja2 Syntax
Feature Syntax Description Variables {{ DOC.field }}Output field values Conditionals {% if %}...{% endif %}Conditional content Loops {% for item in items %}Iterate over arrays Filters {{ value | tojson }}Transform values Comments {# comment #}Template comments
Useful Filters
Filter Description Example tojsonJSON-safe encoding {{ DOC.data | tojson }}lengthGet array/string length {{ DOC.tags | length }}defaultFallback value {{ DOC.optional | default('N/A') }}first / lastArray element {{ DOC.items | first }}joinJoin array {{ DOC.tags | join(', ') }}
Configuration Examples
Simple Field Selection
With JSON Escaping
Conditional Field Inclusion
Array Iteration
Nested Field Access
Strict Mode (Fail on Error)
External Workflow API Format
{
"stage_type" : "apply" ,
"stage_id" : "json_transform" ,
"parameters" : {
"template" : "{ \" id \" : \" {{ DOC.document_id }} \" , \" content \" : \" {{ DOC.text }} \" , \" score \" : {{ DOC.score }}}"
}
}
Error Handling
Setting Behavior fail_on_error: false (default)Skip failed documents with warning, continue processing fail_on_error: trueFail entire retrieval on first transformation error
Common failure causes:
Invalid template syntax
Template rendering errors (missing fields)
Invalid JSON output from template
Document missing required fields
Use fail_on_error: false for public APIs where partial results are acceptable. Use fail_on_error: true for internal workflows where data integrity is critical.
Metric Value Latency < 1ms per document Processing Sequential (fast, no caching needed) Schema Output completely defined by template
Multi-line Templates
For complex templates, use HEREDOC syntax in the API call:
curl -X POST " $MP_API_URL /v1/retrievers" \
-H "Authorization: Bearer $MP_API_KEY " \
-d '{
"stages": [{
"stage_type": "apply",
"stage_id": "json_transform",
"parameters": {
"template": "{\n \"id\": \"{{ DOC.document_id }}\",\n \"title\": {{ DOC.title | tojson }},\n \"items\": [\n {% for item in DOC.items %}{\n \"name\": \"{{ item.name }}\",\n \"value\": {{ item.value }}\n }{% if not loop.last %},{% endif %}\n {% endfor %}\n ]\n}"
}
}]
}'
Common Patterns
Drop Unused Fields
{
"template" : "{ \" id \" : \" {{ DOC.document_id }} \" , \" title \" : \" {{ DOC.title }} \" , \" url \" : \" {{ DOC.url }} \" }"
}
{
"template" : "{ \" doc_id \" : \" {{ DOC.document_id }} \" , \" user_id \" : \" {{ DOC.metadata.user_id }} \" , \" category \" : \" {{ DOC.metadata.category }} \" , \" score \" : {{ DOC.score }}}"
}
Add Query Context
{
"template" : "{ \" query \" : \" {{ INPUT.query }} \" , \" result_id \" : \" {{ DOC.document_id }} \" , \" score \" : {{ DOC.score }}}"
}