Skip to main content
Buckets organize raw inputs before the Engine transforms them into documents. Each bucket enforces a JSON schema that describes the blobs you expect to ingest (text, image, audio, video, json, binary).

Create a Bucket

curl -sS -X POST "$MP_API_URL/v1/buckets" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket_name": "product-catalog",
    "description": "E-commerce product data",
    "schema": {
      "properties": {
        "product_text": { "type": "text", "required": true },
        "hero_image": { "type": "image" },
        "spec_sheet": { "type": "json" }
      }
    }
  }'
Response fields:
  • bucket_id
  • schema with validation metadata
  • object_count
  • created_at

Bucket Schema

  • Uses a lightweight JSON schema subset (type, required, enum, description).
  • Validates each object’s blobs before storing metadata.
  • Helps collections map input fields to feature extractor targets.
Example schema fragment:
{
  "properties": {
    "transcript": {
      "type": "text",
      "description": "Full podcast transcript",
      "required": true
    },
    "audio_file": {
      "type": "audio",
      "required": true
    }
  }
}

Manage Buckets

  • Get bucketGET /v1/buckets/{bucket_id}
  • List bucketsPOST /v1/buckets/list (supports filters, sort, pagination)
  • Delete bucketDELETE /v1/buckets/{bucket_id} (removes objects and blobs)
Buckets are strictly namespace-scoped: the same bucket name can exist in different namespaces without conflict.

Bucket vs Collection

AspectBucketCollection
PurposeRaw input registryProcessed documents + features
SchemaBlob validationOutput schema (deterministic)
StorageMongoDB (metadata) + S3 (blobs)MongoDB (metadata) + Qdrant (vectors/payloads)
ProcessingNoneRuns feature extractors via Engine

Best Practices

  • One bucket per data domain (products, support tickets, surveillance footage).
  • Keep schemas coarse; collections can slice the data differently downstream.
  • Use key_prefix in objects to group files (e.g., /2025/01/).
  • Leverage metadata for later filtering (set tags at ingestion time).
Buckets give you a reliable staging area for multimodal data—clean separation before you branch into multiple collection-specific processing pipelines.