Buckets

Buckets organize raw inputs before the Engine transforms them into documents. Each bucket enforces a JSON schema that describes the blobs you expect to ingest (text, image, audio, video, json, binary).

Create a Bucket

curl -sS -X POST "$MP_API_URL/v1/buckets" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket_name": "product-catalog",
    "description": "E-commerce product data",
    "schema": {
      "properties": {
        "product_text": { "type": "text", "required": true },
        "hero_image": { "type": "image" },
        "spec_sheet": { "type": "json" }
      }
    }
  }'

Response fields:

bucket_id
schema with validation metadata
object_count
created_at

Bucket Schema

Uses a lightweight JSON schema subset (type, required, enum, description).
Validates each object’s blobs before storing metadata.
Helps collections map input fields to feature extractor targets.

Example schema fragment:

{
  "properties": {
    "transcript": {
      "type": "text",
      "description": "Full podcast transcript",
      "required": true
    },
    "audio_file": {
      "type": "audio",
      "required": true
    }
  }
}

Manage Buckets

Get bucket – GET /v1/buckets/{bucket_id}
List buckets – POST /v1/buckets/list (supports filters, sort, pagination)
Delete bucket – DELETE /v1/buckets/{bucket_id} (removes objects and blobs)

Buckets are strictly namespace-scoped: the same bucket name can exist in different namespaces without conflict.

Bucket vs Collection

Aspect	Bucket	Collection
Purpose	Raw input registry	Processed documents + features
Schema	Blob validation	Output schema (deterministic)
Storage	MongoDB (metadata) + S3 (blobs)	MongoDB (metadata) + Qdrant (vectors/payloads)
Processing	None	Runs feature extractors via Engine

Best Practices

One bucket per data domain (products, support tickets, surveillance footage).
Keep schemas coarse; collections can slice the data differently downstream.
Use key_prefix in objects to group files (e.g., /2025/01/).
Leverage metadata for later filtering (set tags at ingestion time).

Buckets give you a reliable staging area for multimodal data—clean separation before you branch into multiple collection-specific processing pipelines.

Getting Started

Ingest Data

Process Data

Search & Retrieve

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

Create a Bucket

Bucket Schema

Manage Buckets

Bucket vs Collection

Best Practices

Getting Started

Ingest Data

Process Data

Search & Retrieve

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

​Create a Bucket

​Bucket Schema

​Manage Buckets

​Bucket vs Collection

​Best Practices

Create a Bucket

Bucket Schema

Manage Buckets

Bucket vs Collection

Best Practices