Get a document by ID.

curl --request GET \
  --url https://api.mixpeek.com/v1/collections/{collection_identifier}/documents/{document_id} \
  --header 'Authorization: Bearer <token>' \
  --header 'X-Namespace: <x-namespace>'

{
  "collection_id": "col_articles",
  "description": "Text document without artifacts (return_url=false)",
  "document_blobs": [],
  "document_id": "doc_f8966ff29c18e20c6b45e053",
  "lineage_chain": [
    {
      "collection_id": "col_articles",
      "feature_extractor_id": "text_extractor_v1",
      "timestamp": "2025-10-31T10:00:00Z"
    }
  ],
  "lineage_path": "bkt_content/col_articles",
  "metadata": {
    "author": "Dr. Smith",
    "ingestion_status": "COMPLETED",
    "title": "AI in Healthcare"
  },
  "root_bucket_id": "bkt_content",
  "root_object_id": "obj_article_001",
  "source_blobs": [
    {
      "blob_id": "blob_text_001",
      "blob_property": "content",
      "blob_type": "text"
    }
  ],
  "source_object_id": "obj_article_001",
  "source_type": "bucket"
}

GET

collections

{collection_identifier}

documents

{document_id}

Get a document by ID.

curl --request GET \
  --url https://api.mixpeek.com/v1/collections/{collection_identifier}/documents/{document_id} \
  --header 'Authorization: Bearer <token>' \
  --header 'X-Namespace: <x-namespace>'

{
  "collection_id": "col_articles",
  "description": "Text document without artifacts (return_url=false)",
  "document_blobs": [],
  "document_id": "doc_f8966ff29c18e20c6b45e053",
  "lineage_chain": [
    {
      "collection_id": "col_articles",
      "feature_extractor_id": "text_extractor_v1",
      "timestamp": "2025-10-31T10:00:00Z"
    }
  ],
  "lineage_path": "bkt_content/col_articles",
  "metadata": {
    "author": "Dr. Smith",
    "ingestion_status": "COMPLETED",
    "title": "AI in Healthcare"
  },
  "root_bucket_id": "bkt_content",
  "root_object_id": "obj_article_001",
  "source_blobs": [
    {
      "blob_id": "blob_text_001",
      "blob_property": "content",
      "blob_type": "text"
    }
  ],
  "source_object_id": "obj_article_001",
  "source_type": "bucket"
}

Authorizations

Authorization

string

header

required

Bearer token authentication using your API key. Format: 'Bearer your_api_key'. To get an API key, create an account at mixpeek.com/start and generate a key in your account settings.

Headers

Authorization

string

required

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer sk_live_abc123def456"

"Bearer sk_test_xyz789"

X-Namespace

string

required

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Path Parameters

collection_identifier

string

required

The ID of the collection.

document_id

string

required

The ID of the document to retrieve.

Query Parameters

return_url

boolean | null

default:false

return_vectors

boolean | null

default:false

Response

Successful Response

Response model for a single document.

This is the standard response format when fetching documents via API endpoints. Contains all document data plus optional presigned URLs for S3 blobs.

Key Fields: - document_id: Application ID for queries/references - lineage: Complete processing history and source tracking - User-defined fields: Any fields from source passed via field_passthrough - source_blobs: Which blobs from source object were processed - document_blobs: Artifacts generated by extractor (thumbnails, etc.) - enrichment fields: Flat taxonomy/cluster fields (e.g., taxonomy_*_label, cluster_id)

Query Parameters Affecting Response: - return_url=true: Adds presigned_url to each document_blobs entry - return_vectors=true: Includes embedding arrays in response

Use Cases: - Display document details in UI - Download source files or generated artifacts - Understand document provenance and processing - Access enrichment fields (flat) for filtering/display

document_id

string

required

REQUIRED. Unique identifier for the document. Format: 'doc_' prefix + alphanumeric characters. Use for: API queries, references, filtering.

Examples:

"doc_f8966ff29c18e20c6b45e053"

"doc_abc123"

collection_id

string

required

REQUIRED. ID of the collection this document belongs to. Format: 'col_' prefix + alphanumeric characters. Use for: Collection-scoped queries, filtering.

Examples:

"col_articles"

"col_video_frames"

root_object_id

string | null

Denormalized root object identifier for the document.

Examples:

"obj_video123"

root_bucket_id

string | null

Denormalized bucket identifier for the document's root object.

Examples:

"bkt_marketing"

source_type

enum<string> | null

Immediate parent type that produced this document (bucket or collection).

Available options:

bucket,

collection

Examples:

"bucket"

"collection"

source_collection_id

string | null

Collection identifier of the immediate parent when sourced from a collection.

Examples:

"col_frames"

source_document_id

string | null

Document identifier of the immediate parent when sourced from a collection.

Examples:

"doc_frame_050"

source_object_id

string | null

Bucket object identifier of the immediate parent when sourced from a bucket.

Examples:

"obj_video_123"

lineage_path

string | null

Materialized lineage path for fast lookups (e.g., 'bkt_123/col_frames/col_scenes').

Examples:

"bkt_marketing/col_frames/col_scenes"

lineage_chain

LineageStep · object[]

Processing steps from root object to this document. Contains: collection_id, feature_extractor_id, document_id (if intermediate), timestamp. Use for: Visualizing pipeline, debugging, audit trail.

Show child attributes

source_blobs

Source Blobs · object[]

Lightweight references to source object's blobs (blob_id, blob_property, blob_type). For full details, fetch: GET /buckets/{bucket_id}/objects/{object_id}

metadata

object

Unified metadata dictionary containing both user-defined and system metadata (ingestion_status, feature_extractor_config_hash, processing_history, etc.)

vector

number[] | null

Vector embedding for the document (only included when return_vectors=true)

document_blobs

BlobURLRef · object[]

Artifacts generated during feature extraction (thumbnails, processed outputs). When return_url=true, each entry includes a presigned_url field.

Show child attributes

Create a document.Update Document

⌘I

Health

Namespaces

Buckets

Feature Extractors

Collections

Retrievers

Taxonomies

Clusters

Analytics

Tasks

Webhooks

Get a document by ID.

Authorizations

Headers

Path Parameters

Query Parameters

Response