Skip to main content
PATCH
/
v1
/
collections
/
{collection_identifier}
/
documents
/
{document_id}
Patch Document
curl --request PATCH \
  --url https://api.mixpeek.com/v1/collections/{collection_identifier}/documents/{document_id} \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --header 'X-Namespace: <x-namespace>' \
  --data '{
  "metadata": {}
}'
{
  "collection_id": "col_articles",
  "description": "Text document without artifacts (return_url=false)",
  "document_blobs": [],
  "document_id": "doc_f8966ff29c18e20c6b45e053",
  "internal_metadata": {
    "ingestion_status": "COMPLETED"
  },
  "lineage": {
    "root_bucket_id": "bkt_content",
    "root_object_id": "obj_article_001",
    "source_object_id": "obj_article_001",
    "source_type": "bucket"
  },
  "metadata": {
    "author": "Dr. Smith",
    "title": "AI in Healthcare"
  },
  "presigned_urls": [],
  "source_blobs": [
    {
      "blob_id": "blob_text_001",
      "blob_property": "content",
      "blob_type": "text"
    }
  ]
}

Authorizations

Authorization
string
header
required

Bearer token authentication using your API key. Format: 'Bearer your_api_key'. To get an API key, create an account at mixpeek.com/start and generate a key in your account settings.

Headers

Authorization
string
required

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer sk_live_abc123def456"

"Bearer sk_test_xyz789"

X-Namespace
string
required

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Path Parameters

collection_identifier
string
required

The ID of the collection.

document_id
string
required

The ID of the document to patch.

Body

application/json

Request model for partially updating a document (PATCH operation).

metadata
object | null

Updated metadata for the document.

Response

Successful Response

Response model for a single document.

This is the standard response format when fetching documents via API endpoints. Contains all document data plus optional presigned URLs for S3 blobs.

Key Fields: - document_id: Application ID for queries/references - lineage: Complete processing history and source tracking - metadata: User fields from field_passthrough - source_blobs: Which blobs from source object were processed - document_blobs: Artifacts generated by extractor (thumbnails, etc.) - enrichment fields: Flat taxonomy/cluster fields (e.g., taxonomy_*_label, cluster_id)

Query Parameters Affecting Response: - return_url=true: Adds presigned_url to each document_blobs entry - return_vectors=true: Includes embedding arrays in response

Use Cases: - Display document details in UI - Download source files or generated artifacts - Understand document provenance and processing - Access enrichment fields (flat) for filtering/display

document_id
string
required

REQUIRED. Unique identifier for the document. Format: 'doc_' prefix + alphanumeric characters. Use for: API queries, references, filtering.

Examples:

"doc_f8966ff29c18e20c6b45e053"

"doc_abc123"

collection_id
string
required

REQUIRED. ID of the collection this document belongs to. Format: 'col_' prefix + alphanumeric characters. Use for: Collection-scoped queries, filtering.

Examples:

"col_articles"

"col_video_frames"

document_schema_version
enum<string>
default:v1

Schema version of the document payload returned by the API.

Available options:
v1,
v2
Examples:

"v1"

"v2"

root_object_id
string | null

Denormalized root object identifier for the document.

Examples:

"obj_video123"

root_bucket_id
string | null

Denormalized bucket identifier for the document's root object.

Examples:

"bkt_marketing"

source_type
enum<string> | null

Immediate parent type that produced this document (bucket or collection).

Available options:
bucket,
collection
Examples:

"bucket"

"collection"

source_collection_id
string | null

Collection identifier of the immediate parent when sourced from a collection.

Examples:

"col_frames"

source_document_id
string | null

Document identifier of the immediate parent when sourced from a collection.

Examples:

"doc_frame_050"

source_object_id
string | null

Bucket object identifier of the immediate parent when sourced from a bucket.

Examples:

"obj_video_123"

lineage_path
string | null

Materialized lineage path for fast lookups (e.g., 'bkt_123/col_frames/col_scenes').

Examples:

"bkt_marketing/col_frames/col_scenes"

lineage
object | null

NOT REQUIRED (optional for v2 flat schema). Complete lineage from root object through all transformations. Contains: root_object_id, root_bucket_id, source_type, lineage_chain. For v2 schema, lineage data is stored as flat fields on the document. Use lineage.root_object_id to access the source object ID. Use lineage.root_bucket_id to fetch the source object: GET /buckets/{root_bucket_id}/objects/{root_object_id} Complete lineage chain for a document in a decomposition tree.

Every document in Mixpeek tracks its complete processing history from the original raw object in a bucket through all transformation stages.

This enables:

  • Tracing any document back to its source object
  • Understanding the full processing pipeline
  • Querying all documents derived from a specific object
  • Building decomposition tree visualizations

Example lineage for a scene document: Object (video) → Frames Collection → Scenes Collection root_object_id tracks the original video lineage_chain shows: [frames step, scenes step]

Example: ```python # First-level processing (bucket → collection) lineage = DocumentSourceLineage( root_object_id="obj_video123", root_bucket_id="bkt_marketing", source_type="bucket", source_object_id="obj_video123", lineage_chain=[ LineageStep( collection_id="col_frames", feature_extractor_id="video_extractor_v1", timestamp=datetime.now() ) ] )

# Second-level processing (collection → collection)
lineage = DocumentSourceLineage(
root_object_id="obj_video123",
root_bucket_id="bkt_marketing",
source_type="collection",
source_document_id="doc_frame050",
source_collection_id="col_frames",
lineage_chain=[
LineageStep(...), # frames step
LineageStep(...) # scenes step
]
)
```
Examples:
{
"description": "First-level processing: video → frames",
"lineage_chain": [
{
"collection_id": "col_frames",
"feature_extractor_id": "video_extractor_v1",
"timestamp": "2025-10-18T10:30:00Z"
}
],
"root_bucket_id": "bkt_marketing",
"root_object_id": "obj_video123",
"source_object_id": "obj_video123",
"source_type": "bucket"
}
{
"description": "Second-level processing: frames → scenes",
"lineage_chain": [
{
"collection_id": "col_frames",
"document_id": "doc_frame050",
"feature_extractor_id": "video_extractor_v1",
"timestamp": "2025-10-18T10:30:00Z"
},
{
"collection_id": "col_scenes",
"feature_extractor_id": "scene_detector_v1",
"timestamp": "2025-10-18T10:31:15Z"
}
],
"root_bucket_id": "bkt_marketing",
"root_object_id": "obj_video123",
"source_collection_id": "col_frames",
"source_document_id": "doc_frame050",
"source_type": "collection"
}
source_blobs
Source Blobs · object[]

Lightweight references to source object's blobs (blob_id, blob_property, blob_type). For full details, fetch: GET /buckets/{bucket_id}/objects/{object_id}

internal_metadata
object

System metadata (ingestion_status, feature_extractor_config_hash, etc.)

metadata
object

User-provided metadata inherited from the source object

vector
number[] | null

Vector embedding for the document (only included when return_vectors=true)

presigned_url
string | null

NOT REQUIRED - only populated when return_url=true query parameter is used. Single presigned URL for the primary source blob (for backward compatibility). For multiple blobs, use presigned_urls array or document_blobs[].presigned_url instead.

document_blobs
BlobURLRef · object[]

Artifacts generated during feature extraction (thumbnails, processed outputs). When return_url=true, each entry includes a presigned_url field.

presigned_urls
PresignedURLModel · object[]

Aggregated presigned URLs for all blobs. Only populated when return_url=true query parameter is provided.