Creating an object does not trigger feature extraction. Processing happens when you create and submit a batch.
Object Schema
key_prefix(optional) – Logical path to help organize downstream documents.metadata(optional) – Arbitrary JSON copied into documents through field passthrough.blobs(required) – Each entry must match a property defined in the bucket schema.
text, json, image, video, audio, binary. Blob data can be raw content, a base64 payload, or a URL Mixpeek can fetch.
Create an Object
skip_duplicates: avoid reprocessing identical blobs by content hash.key_prefix: namespacing for logical groupings.metadata: provide information used later by taxonomies or retrievers.
Retrieve Objects
return_url=truegenerates presigned URLs for blobs (expires ≈ 1 hour).- Listing objects supports rich filtering and pagination:
POST /v1/buckets/<bucket_id>/objects/list.
Lineage
Downstream documents retain:source_blobslink back to object blobs (without duplicating large content).document_blobscontain extractor-generated artifacts (e.g., thumbnails).- To inspect the entire decomposition tree for an object, call
/v1/objects/{object_id}/decomposition-tree.
Best Practices
- Define bucket schemas up front so object validation fails fast.
- Set metadata that retrievers or taxonomies will use for filtering.
- Chunk large uploads into multiple objects instead of massive blobs for better parallelism.
- Use batches (
/v1/buckets/{bucket}/batches) to process groups of objects efficiently. - Track key prefixes to simplify downstream grouping or deduplication during retrieval.

