Skip to main content
This guide spins up the end-to-end Mixpeek workflow: create an isolated namespace, register raw objects, materialize features through the Engine, and query results with a stage-based retriever. Every request matches the current OpenAPI specification.
Prefer video? Watch the walkthrough →

Prerequisites

  • A Mixpeek account and API key (obtain one at mixpeek.com/start)
  • curl (or an HTTP client of your choice)
  • Basic familiarity with JSON payloads
export MP_API_URL="https://api.mixpeek.com"
export MP_API_KEY="sk_live_replace_me"
All subsequent examples send two headers:
-H "Authorization: Bearer $MP_API_KEY"
-H "X-Namespace: ns_quickstart"   # replace with your namespace id once created

1. Create (or Choose) a Namespace

Namespaces guarantee tenant isolation across MongoDB, Qdrant, Redis, and task execution. If you already have one, skip to step 2.
curl -sS -X POST "$MP_API_URL/v1/namespaces" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "namespace_name": "quickstart",
    "description": "Docs quickstart namespace",
    "feature_extractors": [
      { "feature_extractor_name": "text_extractor", "version": "v1" }
    ]
  }'
Copy the returned namespace_id and export it:
export MP_NAMESPACE="ns_quickstart"

2. Create a Bucket

Buckets validate object shape and track blobs in S3-compatible storage.
curl -sS -X POST "$MP_API_URL/v1/buckets" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket_name": "quickstart-bucket",
    "description": "Sample product descriptions",
    "schema": {
      "properties": {
        "product_text": { "type": "text", "required": true }
      }
    }
  }'
Set an environment variable for the bucket_id returned above.

3. Define a Collection with a Feature Extractor

Collections map bucket objects into documents by running feature extractors on the Engine. In v2 the feature_extractor field is singular.
curl -sS -X POST "$MP_API_URL/v1/collections" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_name": "quickstart-docs",
    "description": "Embeddings for product text",
    "source": {
      "type": "bucket",
      "bucket_id": "<bucket_id>"
    },
    "feature_extractor": {
      "feature_extractor_name": "text_extractor",
      "version": "v1",
      "input_mappings": {
        "text": "product_text"
      },
      "field_passthrough": [
        { "source_path": "metadata.category" }
      ],
      "parameters": {
        "model": "multilingual-e5-large-instruct"
      }
    }
  }'
Collections immediately expose their deterministic output_schema, so you can build integrations before any documents are processed.

4. Register an Object

Objects simply register blobs and metadata in the bucket. Processing happens later.
curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/objects" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "key_prefix": "/catalog",
    "metadata": { "category": "headphones" },
    "blobs": [
      {
        "property": "product_text",
        "type": "text",
        "data": "Lightweight wireless headphones with active noise cancellation."
      }
    ]
  }'
Store the returned object_id.

5. Create and Submit a Batch

Flatten objects into per-extractor artifacts and dispatch the Engine.
curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/batches" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "object_ids": ["<object_id>"]
  }'
Submit the batch for processing (note the returned task_id):
curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/batches/<batch_id>/submit" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{ "include_processing_history": true }'

6. Track Task Progress

Task metadata lives in Redis with MongoDB persistence. Poll until status is COMPLETED (fallback to the batch resource if the task ages out after 24h).
curl -sS -X GET "$MP_API_URL/v1/tasks/<task_id>" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE"

7. Inspect Documents

curl -sS -X POST "$MP_API_URL/v1/collections/<collection_id>/documents/list" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "limit": 10,
    "filters": {
      "field": "metadata.category",
      "operator": "eq",
      "value": "headphones"
    },
    "return_url": false
  }'
Every document includes lineage back to the root object (root_object_id) and feature URIs you can query later.

8. Create a Retriever

Retrievers combine stage-based pipelines and cache-aware execution. The example below performs semantic search with stage-level sorting.
curl -sS -X POST "$MP_API_URL/v1/retrievers" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "retriever_name": "quickstart-search",
    "description": "Semantic search over product descriptions",
    "input_schema": {
      "properties": {
        "query_text": { "type": "text", "required": true }
      }
    },
    "collection_ids": ["<collection_id>"],
    "stages": [
      {
        "stage_name": "knn_search",
        "version": "v1",
        "parameters": {
          "feature_address": "mixpeek://text_extractor@v1/text_embedding",
          "input_mapping": { "text": "query_text" },
          "limit": 20,
          "sort_by": [
            { "field": "score", "direction": "desc" }
          ]
        }
      }
    ],
    "cache_config": {
      "enabled": true,
      "ttl_seconds": 300
    }
  }'
Execute the retriever:
curl -sS -X POST "$MP_API_URL/v1/retrievers/<retriever_id>/execute" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": { "query_text": "wireless headphones with noise cancelling" },
    "limit": 5,
    "return_urls": false
  }'
Responses include execution telemetry (stage_statistics, budget, execution_id) so you can troubleshoot latency or cache behavior.

9. (Optional) Enrich with a Taxonomy

Taxonomies reuse retrievers under the hood to enrich documents via JOIN stages.
curl -sS -X POST "$MP_API_URL/v1/taxonomies" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "taxonomy_name": "product-categories",
    "taxonomy_type": "flat",
    "retriever_id": "<retriever_id>",
    "input_mappings": {
      "query_embedding": "mixpeek://text_extractor@v1/text_embedding"
    },
    "source_collection": {
      "collection_id": "<collection_id>",
      "enrichment_fields": [
        { "field_path": "metadata.category", "merge_mode": "replace" }
      ]
    }
  }'
Attach the taxonomy to your collection’s taxonomy_applications for materialized enrichment, or add a taxonomy stage to the retriever for on-demand enrichment.

Where to Go Next

Need help? Click “Talk to Engineers” in the top bar and we’ll assist with deployment, scaling, or integration design.