Uploads & Ingestion Patterns

Uploads in Mixpeek consist of creating objects inside schema-backed buckets and, optionally, bundling them into batches for processing. Whether you are uploading one file from a client or importing millions of records from a data lake, the workflow follows the same pattern: register → batch → submit → monitor.

Single Object Upload

curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/objects" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "key_prefix": "/catalog/2025-10/",
    "metadata": { "category": "headphones" },
    "skip_duplicates": true,
    "blobs": [
      {
        "property": "product_text",
        "type": "text",
        "data": "Lightweight wireless headphones with adaptive ANC."
      },
      {
        "property": "hero_image",
        "type": "image",
        "data": "https://cdn.example.com/headphones.jpg"
      }
    ]
  }'

skip_duplicates compares blob hashes to avoid storing identical uploads.
key_prefix provides logical grouping for later queries.
Object creation is idempotent—safe to retry failed requests.

Batch Upload

Bulk ingestion is faster when you create objects in batch and submit a single processing job.

curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/objects/batch" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "objects": [
      {
        "key_prefix": "/catalog/headphones",
        "metadata": { "category": "audio" },
        "blobs": [
          { "property": "product_text", "type": "text", "data": "Wireless over-ear headphones..." }
        ]
      },
      {
        "key_prefix": "/catalog/speakers",
        "metadata": { "category": "audio" },
        "blobs": [
          { "property": "product_text", "type": "text", "data": "Compact smart speaker..." }
        ]
      }
    ]
  }'

Then create and submit a batch:

curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/batches" \
  ...

curl -sS -X POST "$MP_API_URL/v1/buckets/<bucket_id>/batches/<batch_id>/submit" \
  ...

The submit response includes a task_id; poll /v1/tasks/{task_id} or watch the batch resource directly.

Upload Sources

Source	Strategy
Browser / client app	Upload to your storage, then register object URLs with Mixpeek
Data lake (S3/GCS)	Point blobs to existing URIs and set `skip_duplicates=true`
Streaming ingestion	Combine incremental object creation with periodic batch submissions
Large archives	Pre-process externally, split into multiple objects, and enqueue batches

Tips for High-Volume Imports

Use multiple smaller batches (1k–10k objects) to keep pollers responsive.
Parallelize object registration—the API layer scales horizontally.
Monitor tasks with the Tasks API or webhooks to coordinate downstream systems.
Leverage retries; object creation and batch submission are safe to repeat on failure.
Tag metadata that will help with retrieval filters or taxonomy enrichment later.

Uploads are just the start—once registered, objects can feed multiple collection pipelines, ensuring you only upload once regardless of how many features or retrievers you build later.

Getting Started

Ingest Data

Process Data

Search & Retrieve

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

Uploads & Ingestion Patterns

Single Object Upload

Batch Upload

Upload Sources

Tips for High-Volume Imports

Getting Started

Ingest Data

Process Data

Search & Retrieve

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

​Single Object Upload

​Batch Upload

​Upload Sources

​Tips for High-Volume Imports

​Related APIs

Single Object Upload

Batch Upload

Upload Sources

Tips for High-Volume Imports

Related APIs