Authorizations
Bearer token authentication using your API key. Format: 'Bearer your_api_key'. To get an API key, create an account at mixpeek.com/start and generate a key in your account settings.
Headers
REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.
"Bearer sk_live_abc123def456"
"Bearer sk_test_xyz789"
REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'
"ns_abc123def456"
"production"
"my-namespace"
Path Parameters
The ID of the collection.
The ID of the document to patch.
Body
Request model for partially updating a document (PATCH operation).
Updated metadata for the document.
Response
Successful Response
Response model for a single document.
This is the standard response format when fetching documents via API endpoints. Contains all document data plus optional presigned URLs for S3 blobs.
Key Fields: - document_id: Application ID for queries/references - lineage: Complete processing history and source tracking - metadata: User fields from field_passthrough - source_blobs: Which blobs from source object were processed - document_blobs: Artifacts generated by extractor (thumbnails, etc.) - enrichment fields: Flat taxonomy/cluster fields (e.g., taxonomy_*_label, cluster_id)
Query Parameters Affecting Response: - return_url=true: Adds presigned_url to each document_blobs entry - return_vectors=true: Includes embedding arrays in response
Use Cases: - Display document details in UI - Download source files or generated artifacts - Understand document provenance and processing - Access enrichment fields (flat) for filtering/display
REQUIRED. Unique identifier for the document. Format: 'doc_' prefix + alphanumeric characters. Use for: API queries, references, filtering.
"doc_f8966ff29c18e20c6b45e053"
"doc_abc123"
REQUIRED. ID of the collection this document belongs to. Format: 'col_' prefix + alphanumeric characters. Use for: Collection-scoped queries, filtering.
"col_articles"
"col_video_frames"
Schema version of the document payload returned by the API.
v1, v2 "v1"
"v2"
Denormalized root object identifier for the document.
"obj_video123"
Denormalized bucket identifier for the document's root object.
"bkt_marketing"
Immediate parent type that produced this document (bucket or collection).
bucket, collection "bucket"
"collection"
Collection identifier of the immediate parent when sourced from a collection.
"col_frames"
Document identifier of the immediate parent when sourced from a collection.
"doc_frame_050"
Bucket object identifier of the immediate parent when sourced from a bucket.
"obj_video_123"
Materialized lineage path for fast lookups (e.g., 'bkt_123/col_frames/col_scenes').
"bkt_marketing/col_frames/col_scenes"
NOT REQUIRED (optional for v2 flat schema). Complete lineage from root object through all transformations. Contains: root_object_id, root_bucket_id, source_type, lineage_chain. For v2 schema, lineage data is stored as flat fields on the document. Use lineage.root_object_id to access the source object ID. Use lineage.root_bucket_id to fetch the source object: GET /buckets/{root_bucket_id}/objects/{root_object_id} Complete lineage chain for a document in a decomposition tree.
Every document in Mixpeek tracks its complete processing history from the original raw object in a bucket through all transformation stages.
This enables:
- Tracing any document back to its source object
- Understanding the full processing pipeline
- Querying all documents derived from a specific object
- Building decomposition tree visualizations
Example lineage for a scene document: Object (video) → Frames Collection → Scenes Collection root_object_id tracks the original video lineage_chain shows: [frames step, scenes step]
Example: ```python # First-level processing (bucket → collection) lineage = DocumentSourceLineage( root_object_id="obj_video123", root_bucket_id="bkt_marketing", source_type="bucket", source_object_id="obj_video123", lineage_chain=[ LineageStep( collection_id="col_frames", feature_extractor_id="video_extractor_v1", timestamp=datetime.now() ) ] )
# Second-level processing (collection → collection)
lineage = DocumentSourceLineage(
root_object_id="obj_video123",
root_bucket_id="bkt_marketing",
source_type="collection",
source_document_id="doc_frame050",
source_collection_id="col_frames",
lineage_chain=[
LineageStep(...), # frames step
LineageStep(...) # scenes step
]
)
```{
"description": "First-level processing: video → frames",
"lineage_chain": [
{
"collection_id": "col_frames",
"feature_extractor_id": "video_extractor_v1",
"timestamp": "2025-10-18T10:30:00Z"
}
],
"root_bucket_id": "bkt_marketing",
"root_object_id": "obj_video123",
"source_object_id": "obj_video123",
"source_type": "bucket"
}{
"description": "Second-level processing: frames → scenes",
"lineage_chain": [
{
"collection_id": "col_frames",
"document_id": "doc_frame050",
"feature_extractor_id": "video_extractor_v1",
"timestamp": "2025-10-18T10:30:00Z"
},
{
"collection_id": "col_scenes",
"feature_extractor_id": "scene_detector_v1",
"timestamp": "2025-10-18T10:31:15Z"
}
],
"root_bucket_id": "bkt_marketing",
"root_object_id": "obj_video123",
"source_collection_id": "col_frames",
"source_document_id": "doc_frame050",
"source_type": "collection"
}Lightweight references to source object's blobs (blob_id, blob_property, blob_type). For full details, fetch: GET /buckets/{bucket_id}/objects/{object_id}
System metadata (ingestion_status, feature_extractor_config_hash, etc.)
User-provided metadata inherited from the source object
Vector embedding for the document (only included when return_vectors=true)
NOT REQUIRED - only populated when return_url=true query parameter is used. Single presigned URL for the primary source blob (for backward compatibility). For multiple blobs, use presigned_urls array or document_blobs[].presigned_url instead.
Artifacts generated during feature extraction (thumbnails, processed outputs). When return_url=true, each entry includes a presigned_url field.
Aggregated presigned URLs for all blobs. Only populated when return_url=true query parameter is provided.

