Skip to main content
Uploads in Mixpeek consist of creating objects inside schema-backed buckets and, optionally, bundling them into batches for processing. Whether you are uploading one file from a client or importing millions of records from a data lake, the workflow follows the same pattern: register → batch → submit → monitor.

Single Object Upload

curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/objects" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "key_prefix": "/catalog/2025-10/",
    "metadata": { "category": "headphones" },
    "skip_duplicates": true,
    "blobs": [
      {
        "property": "product_text",
        "type": "text",
        "data": "Lightweight wireless headphones with adaptive ANC."
      },
      {
        "property": "hero_image",
        "type": "image",
        "data": "https://cdn.example.com/headphones.jpg"
      }
    ]
  }'
  • skip_duplicates compares blob hashes to avoid storing identical uploads.
  • key_prefix provides logical grouping for later queries.
  • Object creation is idempotent—safe to retry failed requests.

Batch Upload

Bulk ingestion is faster when you create objects in batch and submit a single processing job.
curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/objects/batch" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "objects": [
      {
        "key_prefix": "/catalog/headphones",
        "metadata": { "category": "audio" },
        "blobs": [
          { "property": "product_text", "type": "text", "data": "Wireless over-ear headphones..." }
        ]
      },
      {
        "key_prefix": "/catalog/speakers",
        "metadata": { "category": "audio" },
        "blobs": [
          { "property": "product_text", "type": "text", "data": "Compact smart speaker..." }
        ]
      }
    ]
  }'
Then create and submit a batch:
curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/batches" \
  ...

curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/batches/<batch_id>/submit" \
  ...
The submit response includes a task_id; poll /v1/tasks/{task_id} or watch the batch resource directly.

Upload Sources

SourceStrategy
Browser / client appUpload to your storage, then register object URLs with Mixpeek
Data lake (S3/GCS)Point blobs to existing URIs and set skip_duplicates=true
Streaming ingestionCombine incremental object creation with periodic batch submissions
Large archivesPre-process externally, split into multiple objects, and enqueue batches

Tips for High-Volume Imports

  • Use multiple smaller batches (1k–10k objects) to keep pollers responsive.
  • Parallelize object registration—the API layer scales horizontally.
  • Monitor tasks with the Tasks API or webhooks to coordinate downstream systems.
  • Leverage retries; object creation and batch submission are safe to repeat on failure.
  • Tag metadata that will help with retrieval filters or taxonomy enrichment later.
Uploads are just the start—once registered, objects can feed multiple collection pipelines, ensuring you only upload once regardless of how many features or retrievers you build later.