Skip to main content
GET
/
v1
/
collections
/
{collection_identifier}
/
documents
/
{document_id}
Get a document by ID.
curl --request GET \
  --url https://api.mixpeek.com/v1/collections/{collection_identifier}/documents/{document_id} \
  --header 'Authorization: Bearer <token>' \
  --header 'X-Namespace: <x-namespace>'
{
  "collection_id": "col_articles",
  "description": "Text document without artifacts (return_url=false)",
  "document_blobs": [],
  "document_id": "doc_f8966ff29c18e20c6b45e053",
  "internal_metadata": {
    "ingestion_status": "COMPLETED"
  },
  "lineage_chain": [
    {
      "collection_id": "col_articles",
      "feature_extractor_id": "text_extractor_v1",
      "timestamp": "2025-10-31T10:00:00Z"
    }
  ],
  "lineage_path": "bkt_content/col_articles",
  "metadata": {
    "author": "Dr. Smith",
    "title": "AI in Healthcare"
  },
  "presigned_urls": [],
  "root_bucket_id": "bkt_content",
  "root_object_id": "obj_article_001",
  "source_blobs": [
    {
      "blob_id": "blob_text_001",
      "blob_property": "content",
      "blob_type": "text"
    }
  ],
  "source_object_id": "obj_article_001",
  "source_type": "bucket"
}

Authorizations

Authorization
string
header
required

Bearer token authentication using your API key. Format: 'Bearer your_api_key'. To get an API key, create an account at mixpeek.com/start and generate a key in your account settings.

Headers

Authorization
string
required

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer sk_live_abc123def456"

"Bearer sk_test_xyz789"

X-Namespace
string
required

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Path Parameters

collection_identifier
string
required

The ID of the collection.

document_id
string
required

The ID of the document to retrieve.

Query Parameters

return_url
boolean | null
default:false
return_vectors
boolean | null
default:false

Response

Successful Response

Response model for a single document.

This is the standard response format when fetching documents via API endpoints. Contains all document data plus optional presigned URLs for S3 blobs.

Key Fields: - document_id: Application ID for queries/references - lineage: Complete processing history and source tracking - User-defined fields: Any fields from source passed via field_passthrough - source_blobs: Which blobs from source object were processed - document_blobs: Artifacts generated by extractor (thumbnails, etc.) - enrichment fields: Flat taxonomy/cluster fields (e.g., taxonomy_*_label, cluster_id)

Query Parameters Affecting Response: - return_url=true: Adds presigned_url to each document_blobs entry - return_vectors=true: Includes embedding arrays in response

Use Cases: - Display document details in UI - Download source files or generated artifacts - Understand document provenance and processing - Access enrichment fields (flat) for filtering/display

document_id
string
required

REQUIRED. Unique identifier for the document. Format: 'doc_' prefix + alphanumeric characters. Use for: API queries, references, filtering.

Examples:

"doc_f8966ff29c18e20c6b45e053"

"doc_abc123"

collection_id
string
required

REQUIRED. ID of the collection this document belongs to. Format: 'col_' prefix + alphanumeric characters. Use for: Collection-scoped queries, filtering.

Examples:

"col_articles"

"col_video_frames"

root_object_id
string | null

Denormalized root object identifier for the document.

Examples:

"obj_video123"

root_bucket_id
string | null

Denormalized bucket identifier for the document's root object.

Examples:

"bkt_marketing"

source_type
enum<string> | null

Immediate parent type that produced this document (bucket or collection).

Available options:
bucket,
collection
Examples:

"bucket"

"collection"

source_collection_id
string | null

Collection identifier of the immediate parent when sourced from a collection.

Examples:

"col_frames"

source_document_id
string | null

Document identifier of the immediate parent when sourced from a collection.

Examples:

"doc_frame_050"

source_object_id
string | null

Bucket object identifier of the immediate parent when sourced from a bucket.

Examples:

"obj_video_123"

lineage_path
string | null

Materialized lineage path for fast lookups (e.g., 'bkt_123/col_frames/col_scenes').

Examples:

"bkt_marketing/col_frames/col_scenes"

lineage_chain
LineageStep · object[]

Processing steps from root object to this document. Contains: collection_id, feature_extractor_id, document_id (if intermediate), timestamp. Use for: Visualizing pipeline, debugging, audit trail.

source_blobs
Source Blobs · object[]

Lightweight references to source object's blobs (blob_id, blob_property, blob_type). For full details, fetch: GET /buckets/{bucket_id}/objects/{object_id}

internal_metadata
object

System metadata (ingestion_status, feature_extractor_config_hash, etc.)

vector
number[] | null

Vector embedding for the document (only included when return_vectors=true)

presigned_url
string | null

NOT REQUIRED - only populated when return_url=true query parameter is used. Single presigned URL for the primary source blob (for backward compatibility). For multiple blobs, use presigned_urls array or document_blobs[].presigned_url instead.

document_blobs
BlobURLRef · object[]

Artifacts generated during feature extraction (thumbnails, processed outputs). When return_url=true, each entry includes a presigned_url field.

presigned_urls
PresignedURLModel · object[]

Aggregated presigned URLs for all blobs. Only populated when return_url=true query parameter is provided.