This guide spins up the end-to-end Mixpeek workflow: create an isolated namespace, register raw objects, materialize features through the Engine, and query results with a stage-based retriever. Every request matches the current OpenAPI specification.
Prefer video? Watch the walkthrough →
Prerequisites
- A Mixpeek account and API key (obtain one at mixpeek.com/start)
curl (or an HTTP client of your choice)
- Basic familiarity with JSON payloads
export MP_API_URL="https://api.mixpeek.com"
export MP_API_KEY="sk_live_replace_me"
All subsequent examples send two headers:
-H "Authorization: Bearer $MP_API_KEY"
-H "X-Namespace: ns_quickstart" # replace with your namespace id once created
1. Create (or Choose) a Namespace
Namespaces guarantee tenant isolation across MongoDB, Qdrant, Redis, and task execution. If you already have one, skip to step 2.
curl -sS -X POST "$MP_API_URL/v1/namespaces" \
-H "Authorization: Bearer $MP_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"namespace_name": "quickstart",
"description": "Docs quickstart namespace",
"feature_extractors": [
{ "feature_extractor_name": "text_extractor", "version": "v1" }
]
}'
Copy the returned namespace_id and export it:
export MP_NAMESPACE="ns_quickstart"
2. Create a Bucket
Buckets validate object shape and track blobs in S3-compatible storage.
curl -sS -X POST "$MP_API_URL/v1/buckets" \
-H "Authorization: Bearer $MP_API_KEY" \
-H "X-Namespace: $MP_NAMESPACE" \
-H "Content-Type: application/json" \
-d '{
"bucket_name": "quickstart-bucket",
"description": "Sample product descriptions",
"schema": {
"properties": {
"product_text": { "type": "text", "required": true }
}
}
}'
Set an environment variable for the bucket_id returned above.
Collections map bucket objects into documents by running feature extractors on the Engine. In v2 the feature_extractor field is singular.
curl -sS -X POST "$MP_API_URL/v1/collections" \
-H "Authorization: Bearer $MP_API_KEY" \
-H "X-Namespace: $MP_NAMESPACE" \
-H "Content-Type: application/json" \
-d '{
"collection_name": "quickstart-docs",
"description": "Embeddings for product text",
"source": {
"type": "bucket",
"bucket_id": "<bucket_id>"
},
"feature_extractor": {
"feature_extractor_name": "text_extractor",
"version": "v1",
"input_mappings": {
"text": "product_text"
},
"field_passthrough": [
{ "source_path": "metadata.category" }
],
"parameters": {
"model": "multilingual-e5-large-instruct"
}
}
}'
Collections immediately expose their deterministic output_schema, so you can build integrations before any documents are processed.
4. Register an Object
Objects simply register blobs and metadata in the bucket. Processing happens later.
curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/objects" \
-H "Authorization: Bearer $MP_API_KEY" \
-H "X-Namespace: $MP_NAMESPACE" \
-H "Content-Type: application/json" \
-d '{
"key_prefix": "/catalog",
"metadata": { "category": "headphones" },
"blobs": [
{
"property": "product_text",
"type": "text",
"data": "Lightweight wireless headphones with active noise cancellation."
}
]
}'
Store the returned object_id.
5. Create and Submit a Batch
Flatten objects into per-extractor artifacts and dispatch the Engine.
curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/batches" \
-H "Authorization: Bearer $MP_API_KEY" \
-H "X-Namespace: $MP_NAMESPACE" \
-H "Content-Type: application/json" \
-d '{
"object_ids": ["<object_id>"]
}'
Submit the batch for processing (note the returned task_id):
curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/batches/<batch_id>/submit" \
-H "Authorization: Bearer $MP_API_KEY" \
-H "X-Namespace: $MP_NAMESPACE" \
-H "Content-Type: application/json" \
-d '{ "include_processing_history": true }'
6. Track Task Progress
Task metadata lives in Redis with MongoDB persistence. Poll until status is COMPLETED (fallback to the batch resource if the task ages out after 24h).
curl -sS -X GET "$MP_API_URL/v1/tasks/<task_id>" \
-H "Authorization: Bearer $MP_API_KEY" \
-H "X-Namespace: $MP_NAMESPACE"
7. Inspect Documents
curl -sS -X POST "$MP_API_URL/v1/collections/<collection_id>/documents/list" \
-H "Authorization: Bearer $MP_API_KEY" \
-H "X-Namespace: $MP_NAMESPACE" \
-H "Content-Type: application/json" \
-d '{
"limit": 10,
"filters": {
"field": "metadata.category",
"operator": "eq",
"value": "headphones"
},
"return_url": false
}'
Every document includes lineage back to the root object (root_object_id) and feature URIs you can query later.
8. Create a Retriever
Retrievers combine stage-based pipelines and cache-aware execution. The example below performs semantic search with stage-level sorting.
curl -sS -X POST "$MP_API_URL/v1/retrievers" \
-H "Authorization: Bearer $MP_API_KEY" \
-H "X-Namespace: $MP_NAMESPACE" \
-H "Content-Type: application/json" \
-d '{
"retriever_name": "quickstart-search",
"description": "Semantic search over product descriptions",
"input_schema": {
"properties": {
"query_text": { "type": "text", "required": true }
}
},
"collection_ids": ["<collection_id>"],
"stages": [
{
"stage_name": "knn_search",
"version": "v1",
"parameters": {
"feature_address": "mixpeek://text_extractor@v1/text_embedding",
"input_mapping": { "text": "query_text" },
"limit": 20,
"sort_by": [
{ "field": "score", "direction": "desc" }
]
}
}
],
"cache_config": {
"enabled": true,
"ttl_seconds": 300
}
}'
Execute the retriever:
curl -sS -X POST "$MP_API_URL/v1/retrievers/<retriever_id>/execute" \
-H "Authorization: Bearer $MP_API_KEY" \
-H "X-Namespace: $MP_NAMESPACE" \
-H "Content-Type: application/json" \
-d '{
"inputs": { "query_text": "wireless headphones with noise cancelling" },
"limit": 5,
"return_urls": false
}'
Responses include execution telemetry (stage_statistics, budget, execution_id) so you can troubleshoot latency or cache behavior.
9. (Optional) Enrich with a Taxonomy
Taxonomies reuse retrievers under the hood to enrich documents via JOIN stages.
curl -sS -X POST "$MP_API_URL/v1/taxonomies" \
-H "Authorization: Bearer $MP_API_KEY" \
-H "X-Namespace: $MP_NAMESPACE" \
-H "Content-Type: application/json" \
-d '{
"taxonomy_name": "product-categories",
"taxonomy_type": "flat",
"retriever_id": "<retriever_id>",
"input_mappings": {
"query_embedding": "mixpeek://text_extractor@v1/text_embedding"
},
"source_collection": {
"collection_id": "<collection_id>",
"enrichment_fields": [
{ "field_path": "metadata.category", "merge_mode": "replace" }
]
}
}'
Attach the taxonomy to your collection’s taxonomy_applications for materialized enrichment, or add a taxonomy stage to the retriever for on-demand enrichment.
Where to Go Next
Need help? Click “Talk to Engineers” in the top bar and we’ll assist with deployment, scaling, or integration design.