Authorizations
Bearer token authentication using your API key. Format: 'Bearer your_api_key'. To get an API key, create an account at mixpeek.com/start and generate a key in your account settings.
Headers
REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.
"Bearer sk_live_abc123def456"
"Bearer sk_test_xyz789"
REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'
"ns_abc123def456"
"production"
"my-namespace"
Path Parameters
The ID of the collection.
The ID of the document to retrieve.
Response
Successful Response
Response model for a single document.
This is the standard response format when fetching documents via API endpoints. Contains all document data plus optional presigned URLs for S3 blobs.
Key Fields: - document_id: Application ID for queries/references - lineage: Complete processing history and source tracking - User-defined fields: Any fields from source passed via field_passthrough - source_blobs: Which blobs from source object were processed - document_blobs: Artifacts generated by extractor (thumbnails, etc.) - enrichment fields: Flat taxonomy/cluster fields (e.g., taxonomy_*_label, cluster_id)
Query Parameters Affecting Response: - return_url=true: Adds presigned_url to each document_blobs entry - return_vectors=true: Includes embedding arrays in response
Use Cases: - Display document details in UI - Download source files or generated artifacts - Understand document provenance and processing - Access enrichment fields (flat) for filtering/display
REQUIRED. Unique identifier for the document. Format: 'doc_' prefix + alphanumeric characters. Use for: API queries, references, filtering.
"doc_f8966ff29c18e20c6b45e053"
"doc_abc123"
REQUIRED. ID of the collection this document belongs to. Format: 'col_' prefix + alphanumeric characters. Use for: Collection-scoped queries, filtering.
"col_articles"
"col_video_frames"
Denormalized root object identifier for the document.
"obj_video123"
Denormalized bucket identifier for the document's root object.
"bkt_marketing"
Immediate parent type that produced this document (bucket or collection).
bucket, collection "bucket"
"collection"
Collection identifier of the immediate parent when sourced from a collection.
"col_frames"
Document identifier of the immediate parent when sourced from a collection.
"doc_frame_050"
Bucket object identifier of the immediate parent when sourced from a bucket.
"obj_video_123"
Materialized lineage path for fast lookups (e.g., 'bkt_123/col_frames/col_scenes').
"bkt_marketing/col_frames/col_scenes"
Processing steps from root object to this document. Contains: collection_id, feature_extractor_id, document_id (if intermediate), timestamp. Use for: Visualizing pipeline, debugging, audit trail.
Lightweight references to source object's blobs (blob_id, blob_property, blob_type). For full details, fetch: GET /buckets/{bucket_id}/objects/{object_id}
System metadata (ingestion_status, feature_extractor_config_hash, etc.)
Vector embedding for the document (only included when return_vectors=true)
NOT REQUIRED - only populated when return_url=true query parameter is used. Single presigned URL for the primary source blob (for backward compatibility). For multiple blobs, use presigned_urls array or document_blobs[].presigned_url instead.
Artifacts generated during feature extraction (thumbnails, processed outputs). When return_url=true, each entry includes a presigned_url field.
Aggregated presigned URLs for all blobs. Only populated when return_url=true query parameter is provided.

