Execute Anonymous Retriever

curl --request POST \
  --url https://api.mixpeek.com/v1/retrievers/execute \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --header 'X-Namespace: <x-namespace>' \
  --data '
{
  "collection_identifiers": [
    "<string>"
  ],
  "input_schema": {},
  "stages": [
    {
      "stage_name": "<string>",
      "config": {},
      "stage_type": "filter",
      "batch_size": "<string>",
      "description": "<string>"
    }
  ],
  "inputs": {},
  "budget_limits": {
    "max_credits": 100,
    "max_time_ms": 60000
  }
}
'

{
  "execution_id": "<string>",
  "status": "<string>",
  "documents": [
    {}
  ],
  "pagination": {},
  "stage_statistics": {
    "stages": {},
    "total_time_ms": 0,
    "credits_used": 0
  },
  "budget": {},
  "error": "Retriever execution failed: Collection not found",
  "optimization_applied": false,
  "optimization_summary": {
    "optimization_time_ms": 8.2,
    "optimized_stage_count": 3,
    "original_stage_count": 5,
    "rules_applied": [
      "push_down_filters",
      "group_by_push_down"
    ],
    "stage_reduction_pct": 40
  }
}

Anonymous

Execute Anonymous Retriever

Execute a retriever anonymously (ad-hoc) without persisting the configuration.

This endpoint allows you to execute a retriever without saving it to the database. Useful for one-time queries, testing configurations, or temporary searches.

Streaming Execution (stream=True): Response uses Server-Sent Events (SSE) format with Content-Type: text/event-stream. Each stage emits events as it executes, formatted as: data: \n\n

Event Types (StreamEventType):

stage_start: Emitted when a stage begins (includes stage_name, stage_index, total_stages)
stage_complete: Emitted when a stage finishes (includes documents, statistics, budget_used)
stage_error: Emitted if a stage fails (includes error message)
execution_complete: Final event with complete results and pagination
execution_error: Emitted if entire execution fails

StreamStageEvent Fields:

event_type: Type of event
execution_id: Unique execution identifier
stage_name/stage_index/total_stages: Stage progress info
documents: Intermediate results (stage_complete only)
statistics: Stage metrics (duration_ms, input_count, output_count, efficiency)
budget_used: Cumulative consumption (credits_used, time_elapsed_ms, tokens_used)

Response Headers:

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
X-Execution-Mode: anonymous

Standard Execution (stream=False, default):

Returns ExecuteRetrieverResponse after all stages complete
Includes X-Execution-Mode: anonymous header
execution_metadata.retriever_persisted = False

Use Cases:

One-time queries without saving retriever configuration
Testing stage configurations before persisting
Dynamic retrieval with varying parameters
Real-time progress tracking with streaming

POST

retrievers

execute

Execute Anonymous Retriever

curl --request POST \
  --url https://api.mixpeek.com/v1/retrievers/execute \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --header 'X-Namespace: <x-namespace>' \
  --data '
{
  "collection_identifiers": [
    "<string>"
  ],
  "input_schema": {},
  "stages": [
    {
      "stage_name": "<string>",
      "config": {},
      "stage_type": "filter",
      "batch_size": "<string>",
      "description": "<string>"
    }
  ],
  "inputs": {},
  "budget_limits": {
    "max_credits": 100,
    "max_time_ms": 60000
  }
}
'

{
  "execution_id": "<string>",
  "status": "<string>",
  "documents": [
    {}
  ],
  "pagination": {},
  "stage_statistics": {
    "stages": {},
    "total_time_ms": 0,
    "credits_used": 0
  },
  "budget": {},
  "error": "Retriever execution failed: Collection not found",
  "optimization_applied": false,
  "optimization_summary": {
    "optimization_time_ms": 8.2,
    "optimized_stage_count": 3,
    "original_stage_count": 5,
    "rules_applied": [
      "push_down_filters",
      "group_by_push_down"
    ],
    "stage_reduction_pct": 40
  }
}

Headers

Authorization

string

required

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

X-Namespace

string

required

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Body

application/json

Request to execute a retriever anonymously (ad-hoc) without persistence.

This combines retriever creation parameters with execution inputs to allow one-time retrieval without saving the retriever configuration.

Use Cases: - One-time queries without polluting retriever registry - Testing retriever configurations before persisting - Dynamic retrieval with varying stage configurations - Temporary search operations

Behavior: - Retriever is NOT saved to database - Execution history is NOT tracked - Response includes X-Execution-Mode: anonymous header - execution_metadata.retriever_persisted = False

Examples: Simple anonymous search: { "collection_identifiers": ["col_123"], "input_schema": {"query": {"type": "text", "required": True}}, "stages": [{ "stage_name": "search", "stage_type": "filter", "config": { "stage_id": "feature_filter", "parameters": { "feature_uris": [{ "uri": "urn:embedding:text:bge_base_en_v1_5:1", "input": {"text": "{{inputs.query}}"} }], "limit": 10 } } }], "inputs": {"query": "machine learning"}, "limit": 25 }

collection_identifiers

string[]

required

REQUIRED. Collection identifiers (names or IDs) to query. Can be collection names or IDs. Names are automatically resolved.

Minimum array length: 1

input_schema

Input Schema · object

required

REQUIRED. Input schema defining expected inputs. Each key is an input name, value is a BucketSchemaField.

Show child attributes

input_schema.{key}

BucketSchemaField · object

Schema field definition for bucket objects.

Show child attributes

input_schema.{key}.type

enum<string>

required

Supported data types for bucket schema fields.

Types fall into two categories:

Explicit Types (Type-Safe):
- Provide type guarantees for extractors
- User must pre-classify content
- Compatible with type-specific extractors
Automatic Type (Flexible):
- No type guarantee (runtime detection)
- User doesn't pre-classify content
- Only compatible with multimodal extractors

Available options:

string,

number,

integer,

float,

boolean,

object,

array,

date,

datetime,

json,

file,

text,

image,

audio,

video,

pdf,

document,

spreadsheet,

presentation,

dense_vector,

sparse_vector,

int8_vector,

automatic

input_schema.{key}.default

unknown

input_schema.{key}.items

unknown

input_schema.{key}.properties

Properties · object

input_schema.{key}.examples

any[] | null

OPTIONAL. List of example values for this field. Used by Apps to show example inputs in the UI. Provide multiple diverse examples when possible.

input_schema.{key}.description

string | null

input_schema.{key}.enum

any[] | null

input_schema.{key}.required

boolean | null

default:false

stages

StageConfig · object[]

required

REQUIRED. Ordered list of stage configurations. At least one stage is required for execution.

Minimum array length: 1

Show child attributes

stages.stage_name

string

required

Human-readable stage instance name (REQUIRED).

Minimum string length: 1

stages.config

Config · object

required

Stage implementation parameters (REQUIRED). Must include stage_id key referencing a registered retriever stage. Supports template expressions using Jinja2 syntax resolved at execution time. Template namespaces support both uppercase and lowercase formats: {{INPUT.field}} or {{inputs.field}}, {{DOC.field}} or {{doc.field}}, {{CONTEXT.field}} or {{context.field}}, {{STAGE.field}} or {{stage.field}}. All formats work identically. Provide stage-specific configuration under parameters.

stages.stage_type

enum<string> | null

Functional category of the stage. Optional for creation requests; auto-inferred from stage_id when omitted.

Available options:

filter,

sort,

reduce,

apply

stages.batch_size

string | null

Optional templated batch size expression evaluated per execution. Supports template variables: {{INPUT.page_size}}, {{inputs.page_size}}, {{CONTEXT.budget_remaining}}, etc. Both uppercase and lowercase namespace names are supported (e.g., INPUT/inputs, DOC/doc, CONTEXT/context, STAGE/stage). Defaults to stage-specific value when omitted.

stages.description

string | null

User-facing description of the stage (OPTIONAL).

inputs

Inputs · object

required

REQUIRED. Input values matching the input_schema. These values are passed to stages for parameterization.

budget_limits

BudgetLimits · object

OPTIONAL. Budget limits for execution.

Show child attributes

budget_limits.max_credits

number | null

Maximum credits allowed for a single execution (OPTIONAL).

Required range: x >= 0

budget_limits.max_time_ms

integer | null

Maximum wall-clock time in milliseconds before forcing halt (OPTIONAL).

Required range: x >= 0

Example:

{ "max_credits": 100, "max_time_ms": 60000 }

Response

Successful Response

Response from retriever execution.

execution_id

string

required

REQUIRED. Unique identifier for this execution run. Use this ID to track execution status, retrieve execution details, or query execution history. Format: 'exec_' prefix followed by alphanumeric token.

status

string

required

REQUIRED. Execution status indicating current state. Common values: 'completed', 'failed', 'processing', 'pending'. Check this field to determine if execution succeeded or requires retry.

documents

Documents · object[]

REQUIRED. Final document results after retriever completion. Contains documents that passed through all retriever stages. Each document may include: document_id, payload (full document data), score (relevance score), metadata (collection-specific fields), and any fields added by enrichment/join stages. Empty array indicates no documents matched the query criteria. Note: Legacy format may use 'final_results' instead of 'documents'.

pagination

Pagination · object

REQUIRED. Pagination metadata structure. Format varies by pagination method: Offset pagination: {total, limit, offset, has_next, has_previous}, Cursor pagination: {cursor, has_next, page_size}, Keyset pagination: {next_cursor, has_next}. Use this to navigate through result pages.

stage_statistics

RetrieverExecutionStatistics · object

REQUIRED. Per-stage execution statistics including timing, document counts, cache hit rates, and stage-specific metrics. Use this to understand retriever performance and identify bottlenecks.

Show child attributes

stage_statistics.stages

Stages · object

Per-stage statistics keyed by stage instance name (REQUIRED).

Show child attributes

stage_statistics.stages.{key}

StageStatistics · object

Execution metrics for a single stage in a retriever execution run.

Show child attributes

stage_statistics.stages.{key}.input_count

integer

required

Number of documents received by the stage (REQUIRED).

Required range: x >= 0

stage_statistics.stages.{key}.output_count

integer

required

Number of documents emitted by the stage (REQUIRED).

Required range: x >= 0

stage_statistics.stages.{key}.duration_ms

number

required

Wall-clock duration in milliseconds (REQUIRED).

Required range: x >= 0

stage_statistics.stages.{key}.efficiency

number

required

Output/Input ratio. 0 when input_count is 0 (REQUIRED).

Required range: x >= 0

stage_statistics.stages.{key}.cache_hit

boolean | null

Indicates whether the result originated from stage cache (OPTIONAL).

stage_statistics.stages.{key}.error

string | null

Stage-specific error message if execution failed but retriever execution continued (OPTIONAL).

stage_statistics.stages.{key}.llm_calls

integer | null

Number of LLM invocations performed by the stage (OPTIONAL).

Required range: x >= 0

stage_statistics.stages.{key}.tokens_used

integer | null

Total tokens consumed by the stage (OPTIONAL, only for LLM stages).

Required range: x >= 0

stage_statistics.stages.{key}.metadata

Metadata · object

Stage-specific metadata containing additional execution details (OPTIONAL). For example, join stages include: join_strategy, join_type, matched_count, match_rate, etc. LLM stages may include: model_name, temperature, max_tokens, etc.

stage_statistics.total_time_ms

number

default:0

Total retriever execution time in milliseconds (REQUIRED).

Required range: x >= 0

stage_statistics.credits_used

number

default:0

Total credits consumed across all stages (OPTIONAL in MVP).

Required range: x >= 0

budget

Budget · object

REQUIRED. Budget usage snapshot for this execution. Contains: credits_used (credits consumed), credits_remaining (remaining budget), time_used_ms (execution time), and budget limits. Use this to track resource consumption and enforce budget limits.

error

string | null

OPTIONAL. Retriever-level error message if execution failed. Only present when status='failed'. Contains human-readable error description to help diagnose the failure. Check stage_statistics for stage-specific errors.

Example:

"Retriever execution failed: Collection not found"

optimization_applied

boolean

default:false

OPTIONAL. Whether automatic pipeline optimizations were applied before execution. Mixpeek automatically optimizes retrieval pipelines for performance by reordering stages, merging operations, and pushing work to the database layer. Optimizations preserve logical equivalence - you get the same results, just faster. When true, see optimization_summary for details about what changed.

optimization_summary

Optimization Summary · object

OPTIONAL. Summary of pipeline optimizations applied before execution. Only present when optimization_applied=true. Contains: - original_stage_count: Number of stages in your original pipeline - optimized_stage_count: Number of stages after optimization - optimization_time_ms: Time spent optimizing (typically <100ms) - rules_applied: List of optimization rules that fired - stage_reduction_pct: Percentage reduction in stage count Use this to understand how the optimizer improved your pipeline. See OptimizationRuleType enum for detailed rule descriptions.

Example:

{
  "optimization_time_ms": 8.2,
  "optimized_stage_count": 3,
  "original_stage_count": 5,
  "rules_applied": ["push_down_filters", "group_by_push_down"],
  "stage_reduction_pct": 40
}

Get evaluation results List Anonymous Executions

⌘I

Health

Organizations

Namespaces

Buckets

Feature Extractors

Collections

Retrievers

Taxonomies

Clusters

Templates

Analytics

Resource Search

Inference

Agent Sessions

Tasks

Webhooks

Notifications

Triggers

Execute Anonymous Retriever

Headers

Body

Response