Compose stage-based search pipelines over your collections
Retrievers combine feature-aware search stages, structured filters, enrichment joins, and optional LLM post-processing into a single executable pipeline. Each retriever has an input schema, a list of target collections, and a deterministic set of stages executed in order.
Enrich or transform documents without dropping them
Taxonomy joins, API enrichment, LLM enrichers, JSON transforms
Retrieve the live registry with GET /v1/retrievers/stages. Each entry includes stage_id, category, icon, and parameter schema so you can dynamically build configuration UIs or validations.Live stages:https://api.mixpeek.com/v1/retrievers/stages
Joins documents with data from another collection, similar to a SQL LEFT JOIN. Each input document produces exactly one output document with added fields from the target collection.When to use:
Combine data from multiple collections (e.g., products + catalog info)
Attach user profiles, metadata, or related entities
Denormalize data at query time
Parameters:
Parameter
Required
Description
target_collection_id
Yes
Collection to join with
source_field
Yes*
Field in current documents to match
target_field
Yes*
Field in target collection to match against
fields_to_merge
No
Specific fields to merge (or entire document if omitted)
output_field
No
Where to place enrichment (root or nested path)
retriever_id
No
Use an existing retriever for lookup instead of direct field matching
retriever_config
No
Anonymous retriever definition for complex lookups
retriever_inputs
No
Template inputs when using retriever-based enrichment
strategy
No
enrich (merge fields) or append (add as nested object)
allow_missing
No
Keep documents without matches (default: true)
when
No
Conditional filter for selective enrichment
cache_behavior
No
auto, disabled, or aggressive
cache_ttl_seconds
No
Cache TTL in seconds
*Required for direct joins; not needed when using retriever_id or retriever_config.Examples:
Enriches documents by calling external HTTP APIs. Enables integration with third-party services (Stripe, GitHub, weather APIs, etc.) to augment documents with real-time data.
Security: This stage makes external HTTP requests. Always use allowed_domains to prevent SSRF attacks. Never store credentials directly—use auth.secret_ref to reference vault-stored secrets.
Parameters:
Parameter
Required
Description
url
Yes
API endpoint URL (supports {DOC.field} and {INPUT.field} templates)
allowed_domains
Yes
Domain allowlist for SSRF protection (never use *)
output_field
Yes
Dot-path where API response should be stored
method
No
HTTP method: GET, POST, PUT, PATCH, DELETE (default: GET)
auth
No
Authentication configuration (see below)
headers
No
Additional HTTP headers
body
No
Request body for POST/PUT/PATCH (JSON, supports templates)
Applies a Jinja2 template to each document, rendering the template with full document context and replacing the document with the parsed JSON output. Use this to reformat documents for external APIs or reshape data for downstream consumers.Parameters:
Parameter
Required
Description
template
Yes
Jinja2 template string that must render to valid JSON
fail_on_error
No
Fail entire pipeline on transformation error (default: false)
Performs AI-native web search using Exa’s neural ranking system. Creates new documents from web search results, enabling retriever pipelines to incorporate real-time internet content.
This stage creates new documents (0 → M transformation) rather than enriching existing ones. Use it at the start of a pipeline or to augment internal results with external web sources.
Parameters:
Parameter
Required
Description
query
Yes
Search query (supports {INPUT.field} and {DOC.field} templates)
num_results
No
Number of results (1-100, default: 10)
use_autoprompt
No
Enable Exa’s query enhancement (default: true)
start_published_date
No
Filter by publication date (YYYY-MM-DD format)
category
No
Content category: research paper, news, github, tweet, blog, company, pdf
include_text
No
Include text snippets in results (default: true)
Output Schema:Each result becomes a document with:
metadata.url – Web page URL
metadata.title – Page title
metadata.text – Text snippet (if include_text=true)
metadata.published_date – Publication date (if available)
metadata.author – Author name (if available)
metadata.search_query – Original query used
metadata.search_position – 0-indexed position in results
Stages support dynamic configuration through template expressions using Jinja2 syntax. Both uppercase and lowercase namespace formats are supported and work identically:
Namespace
Description
Examples
INPUT / inputs
User-provided query parameters and inputs
{{INPUT.query_text}}, {{inputs.max_price}}
DOC / doc
Current document fields (for per-document logic)
{{DOC.metadata.category}}, {{doc.content_type}}
CONTEXT / context
Execution state (budget, timing, retriever metadata)
Mixed usage within the same stage is supported. For example, you can use {{INPUT.query}} alongside {{context.budget_remaining}} in the same configuration.
Use PATCH /v1/retrievers/{id} to rename retrievers or adjust cache settings (stages and schema are immutable; create a new retriever for breaking changes).
List retrievers with filters, search, and sort: POST /v1/retrievers/list.
Retrieve execution history: GET /v1/retrievers/{id}/executions.
Diagnose pipelines without executing: POST /v1/retrievers/{id}/explain.