Skip to main content
POST
/
v1
/
collections
/
{collection_identifier}
/
documents
Create a document.
curl --request POST \
  --url https://api.mixpeek.com/v1/collections/{collection_identifier}/documents \
  --header 'Content-Type: application/json' \
  --data '
{
  "collection_id": "collection_123",
  "root_object_id": "<string>",
  "root_bucket_id": "<string>",
  "source_type": "bucket",
  "source_collection_id": "<string>",
  "source_document_id": "<string>",
  "source_object_id": "<string>",
  "lineage_path": "<string>",
  "lineage_chain": [
    {
      "collection_id": "<string>",
      "feature_extractor_id": "<string>",
      "document_id": "doc_frame001",
      "timestamp": "2023-11-07T05:31:56Z"
    }
  ],
  "document_schema_version": "<string>",
  "metadata": {},
  "features": [
    {
      "feature_extractor_id": "<string>",
      "payload": {},
      "vectors": {}
    }
  ]
}
'
{
  "document_id": "<string>",
  "collection_id": "<string>",
  "_internal": {
    "collection_id": "col_articles",
    "created_at": "2025-10-31T10:00:00Z",
    "document_id": "doc_f8966ff29c",
    "internal_id": "org_abc123",
    "lineage": {
      "path": "bkt_content/col_articles",
      "root_bucket_id": "bkt_content",
      "root_object_id": "obj_article_001",
      "source_object_id": "obj_article_001",
      "source_type": "bucket"
    },
    "metadata": {
      "ingestion_status": "COMPLETED"
    },
    "modality": "text",
    "namespace_id": "ns_xyz789",
    "updated_at": "2025-10-31T10:00:00Z"
  }
}

Headers

Authorization
string

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer YOUR_API_KEY"

"Bearer YOUR_STRIPE_API_KEY"

X-Namespace
string

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Path Parameters

collection_identifier
string
required

The ID of the collection.

Body

application/json

Request model for creating a document.

collection_id
string
required

ID of the collection the document belongs to.

Example:

"collection_123"

root_object_id
string | null

Optional denormalized root object identifier provided during creation.

root_bucket_id
string | null

Optional denormalized bucket identifier provided during creation.

source_type
enum<string> | null

Optional immediate parent type for the document.

Available options:
bucket,
collection
source_collection_id
string | null

Optional parent collection identifier when sourced from a collection.

source_document_id
string | null

Optional parent document identifier when sourced from a collection.

source_object_id
string | null

Optional parent object identifier when sourced directly from a bucket.

lineage_path
string | null

Optional materialized lineage path to set during creation.

lineage_chain
LineageStep · object[]

Processing steps from root object to this document. Recommended for decomposition trees.

document_schema_version
string | null

Optional document schema version (v1 or v2). If not provided, uses system default.

metadata
Metadata · object

Optional metadata dictionary for user-defined fields and custom attributes.

features
FeatureModel · object[]

Features to associate with the document

Response

Successful Response

Response model for a single document.

This is the standard response format when fetching documents via API endpoints. Contains all document data plus optional presigned URLs for S3 blobs.

The document payload structure follows native Qdrant format: - System fields are stored in _internal (lineage, metadata, blobs, etc.) - User fields are at root level (brand_name, thumbnail_url, etc.) - Only document_id and collection_id are Mixpeek IDs at root level - No duplication between root and _internal

Query Parameters Affecting Response: - return_url=true: Adds presigned_url to each document_blobs entry - return_vectors=true: Includes embedding arrays in response

Use Cases: - Display document details in UI - Download source files or generated artifacts - Understand document provenance and processing - Access enrichment fields (flat) for filtering/display

document_id
string
required

REQUIRED. Unique identifier for the document. Format: 'doc_' prefix + alphanumeric characters. Use for: API queries, references, filtering.

Examples:

"doc_f8966ff29c18e20c6b45e053"

"doc_abc123"

collection_id
string
required

REQUIRED. ID of the collection this document belongs to. Format: 'col_' prefix + alphanumeric characters. Use for: Collection-scoped queries, filtering.

Examples:

"col_articles"

"col_video_frames"

_internal
InternalPayloadModel · object

System-managed internal fields. Contains all Mixpeek-managed metadata including lineage, processing info, timestamps, and blob references. User-defined fields appear at root level alongside document_id and collection_id.

Example:
{
"collection_id": "col_articles",
"created_at": "2025-10-31T10:00:00Z",
"document_id": "doc_f8966ff29c",
"internal_id": "org_abc123",
"lineage": {
"path": "bkt_content/col_articles",
"root_bucket_id": "bkt_content",
"root_object_id": "obj_article_001",
"source_object_id": "obj_article_001",
"source_type": "bucket"
},
"metadata": { "ingestion_status": "COMPLETED" },
"modality": "text",
"namespace_id": "ns_xyz789",
"updated_at": "2025-10-31T10:00:00Z"
}