Create a Bucket
bucket_idschemawith validation metadataobject_countcreated_at
Bucket Schema
- Uses a lightweight JSON schema subset (type, required, enum, description).
- Validates each object’s blobs before storing metadata.
- Helps collections map input fields to feature extractor targets.
Manage Buckets
- Get bucket –
GET /v1/buckets/{bucket_id} - List buckets –
POST /v1/buckets/list(supports filters, sort, pagination) - Delete bucket –
DELETE /v1/buckets/{bucket_id}(removes objects and blobs)
Bucket vs Collection
| Aspect | Bucket | Collection |
|---|---|---|
| Purpose | Raw input registry | Processed documents + features |
| Schema | Blob validation | Output schema (deterministic) |
| Storage | MongoDB (metadata) + S3 (blobs) | MongoDB (metadata) + Qdrant (vectors/payloads) |
| Processing | None | Runs feature extractors via Engine |
Best Practices
- One bucket per data domain (products, support tickets, surveillance footage).
- Keep schemas coarse; collections can slice the data differently downstream.
- Use
key_prefixin objects to group files (e.g.,/2025/01/). - Leverage metadata for later filtering (set tags at ingestion time).

