Authorizations
Bearer token authentication using your API key. Format: 'Bearer your_api_key'. To get an API key, create an account at mixpeek.com/start and generate a key in your account settings.
Headers
REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.
"Bearer sk_live_abc123def456"
"Bearer sk_test_xyz789"
REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'
"ns_abc123def456"
"production"
"my-namespace"
Body
Request model for creating a new collection.
Collections process data from buckets or other collections using a single feature extractor.
KEY ARCHITECTURAL CHANGE: Each collection has EXACTLY ONE feature extractor.
- Use field_passthrough to include additional source fields in output documents
- Multiple extractors = multiple collections
- This simplifies processing and makes output schema deterministic
CRITICAL: To use input_mappings:
- Your source bucket MUST have a bucket_schema defined
- The input_mappings reference fields from that bucket_schema
- The system validates that mapped fields exist in the source schema
Example workflow:
- Create bucket with schema: { "properties": { "image": {"type": "image"}, "title": {"type": "string"} } }
- Upload objects conforming to that schema
- Create collection with:
- input_mappings: { "image": "image" }
- field_passthrough: [{"source_path": "title"}]
- Output documents will have both extractor outputs AND passthrough fields
Schema Computation:
- output_schema is computed IMMEDIATELY when collection is created
- output_schema = field_passthrough fields + extractor output fields
- No waiting for documents to be processed!
Name of the collection to create
"video_frames"
"product_embeddings"
Source configuration (bucket or collection) for this collection
{
"bucket_ids": ["bkt_marketing_videos"],
"description": "Single bucket source",
"type": "bucket"
}{
"bucket_ids": [
"bkt_us_products",
"bkt_eu_products",
"bkt_asia_products"
],
"description": "Multi-bucket source (region consolidation)",
"type": "bucket"
}{
"bucket_ids": [
"bkt_marketing_assets",
"bkt_sales_assets",
"bkt_support_assets"
],
"description": "Multi-bucket source (team consolidation)",
"type": "bucket"
}{
"collection_id": "col_video_frames",
"description": "Collection source (decomposition tree)",
"type": "collection"
}Single feature extractor for this collection. Use field_passthrough within the extractor config to include additional source fields. For multiple extractors, create multiple collections and use collection-to-collection pipelines.
{
"description": "Text extractor with field passthrough",
"feature_extractor_name": "text_extractor",
"field_passthrough": [
{ "required": true, "source_path": "title" },
{ "source_path": "author" }
],
"input_mappings": { "text": "content" },
"parameters": { "model": "text-embedding-3-small" },
"version": "v1"
}{
"description": "Video extractor with campaign metadata",
"feature_extractor_name": "video_extractor",
"field_passthrough": [
{ "source_path": "campaign_id" },
{ "source_path": "duration_seconds" }
],
"input_mappings": { "video": "video_url" },
"parameters": { "fps": 1 },
"version": "v1"
}{
"description": "Image extractor with all source fields",
"feature_extractor_name": "image_extractor",
"include_all_source_fields": true,
"input_mappings": { "image": "product_image" },
"parameters": { "model": "clip-vit-base-patch32" },
"version": "v1"
}{
"description": "Text extractor with field renaming",
"feature_extractor_name": "text_extractor",
"field_passthrough": [
{
"source_path": "metadata.author",
"target_path": "author"
},
{
"source_path": "metadata.created_at",
"target_path": "created_at"
}
],
"input_mappings": { "text": "content" },
"version": "v1"
}Description of the collection
Input schema for the collection. If not provided, inferred from source bucket's bucket_schema or source collection's output_schema. REQUIRED for input_mappings to work - defines what fields can be mapped to feature extractors. Schema definition for bucket objects.
IMPORTANT: The bucket schema defines what fields your bucket objects will have. This schema is REQUIRED if you want to:
- Create collections that use input_mappings to process your bucket data
- Validate object structure before ingestion
- Enable type-safe data pipelines
The schema defines the custom fields that will be used in:
- Blob properties (e.g., "content", "thumbnail", "transcript")
- Object metadata structure
- Blob data structures
Example workflow:
- Create bucket WITH schema defining your data structure
- Upload objects that conform to that schema
- Create collections that map schema fields to feature extractors
Without a bucket_schema, collections cannot use input_mappings.
Whether the collection is enabled
Additional metadata for the collection
Response
Successful Response
Response model for collection endpoints.
REQUIRED. Human-readable name for the collection. Must be unique within the namespace. Used for: Display, lookups (can query by name or ID), organization. Format: Alphanumeric with underscores/hyphens, 3-100 characters. Examples: 'product_embeddings', 'video_frames', 'customer_documents'.
3 - 100"video_frames"
"product_embeddings"
"customer_documents"
REQUIRED (auto-computed from source). Input schema defining fields available to the feature extractor. Source: bucket.bucket_schema (if source.type='bucket') OR upstream_collection.output_schema (if source.type='collection'). Determines: Which fields can be used in input_mappings and field_passthrough. This is the 'left side' of the transformation - what data goes IN. Format: BucketSchema with properties dict. Use for: Validating input_mappings, configuring field_passthrough.
REQUIRED (auto-computed at creation). Output schema defining fields in final documents. Computed as: field_passthrough fields + extractor output fields (deterministic). Known IMMEDIATELY when collection is created - no waiting for documents! This is the 'right side' of the transformation - what data comes OUT. Use for: Understanding document structure, building queries, schema validation. Example: {title (passthrough), embedding (extractor output)} = output_schema.
REQUIRED. Single feature extractor configuration for this collection. Defines: extractor name/version, input_mappings, field_passthrough, parameters. Task 9 change: ONE extractor per collection (previously supported multiple). For multiple extractors: Create multiple collections and use collection-to-collection pipelines. Use field_passthrough to include additional source fields beyond extractor outputs.
{
"description": "Text extractor with field passthrough",
"feature_extractor_name": "text_extractor",
"field_passthrough": [
{ "required": true, "source_path": "title" },
{ "source_path": "author" }
],
"input_mappings": { "text": "content" },
"parameters": { "model": "text-embedding-3-small" },
"version": "v1"
}{
"description": "Video extractor with campaign metadata",
"feature_extractor_name": "video_extractor",
"field_passthrough": [
{ "source_path": "campaign_id" },
{ "source_path": "duration_seconds" }
],
"input_mappings": { "video": "video_url" },
"parameters": { "fps": 1 },
"version": "v1"
}{
"description": "Image extractor with all source fields",
"feature_extractor_name": "image_extractor",
"include_all_source_fields": true,
"input_mappings": { "image": "product_image" },
"parameters": { "model": "clip-vit-base-patch32" },
"version": "v1"
}{
"description": "Text extractor with field renaming",
"feature_extractor_name": "text_extractor",
"field_passthrough": [
{
"source_path": "metadata.author",
"target_path": "author"
},
{
"source_path": "metadata.created_at",
"target_path": "created_at"
}
],
"input_mappings": { "text": "content" },
"version": "v1"
}REQUIRED. Source configuration defining where data comes from. Type 'bucket': Process objects from one or more buckets (tier 1). Type 'collection': Process documents from upstream collection (tier 2+). For multi-bucket sources, all buckets must have compatible schemas. Determines input_schema and enables decomposition trees.
{
"bucket_ids": ["bkt_marketing_videos"],
"description": "Single bucket source",
"type": "bucket"
}{
"bucket_ids": [
"bkt_us_products",
"bkt_eu_products",
"bkt_asia_products"
],
"description": "Multi-bucket source (region consolidation)",
"type": "bucket"
}{
"bucket_ids": [
"bkt_marketing_assets",
"bkt_sales_assets",
"bkt_support_assets"
],
"description": "Multi-bucket source (team consolidation)",
"type": "bucket"
}{
"collection_id": "col_video_frames",
"description": "Collection source (decomposition tree)",
"type": "collection"
}NOT REQUIRED (auto-generated). Unique identifier for this collection. Used for: API paths, document queries, pipeline references. Format: 'col_' prefix + 10 random alphanumeric characters. Stable after creation - use for all collection references.
"col_a1b2c3d4e5"
"col_xyz789abc"
"col_video_frames"
NOT REQUIRED. Human-readable description of the collection's purpose. Use for: Documentation, team communication, UI display. Common pattern: Describe what the collection contains and what processing is applied.
"Video frames extracted at 1 FPS with CLIP embeddings"
"Product catalog with image embeddings and metadata"
NOT REQUIRED (auto-computed). Snapshot of bucket schemas at collection creation. Only populated for multi-bucket collections (source.type='bucket' with multiple bucket_ids). Key: bucket_id, Value: BucketSchema at time of collection creation. Used for: Schema compatibility validation, document lineage, debugging. Schema snapshot is immutable - bucket schema changes after collection creation do not affect this. Single-bucket collections may omit this field (schema in input_schema is sufficient).
null
{
"bkt_eu_products": {
"properties": {
"image": { "type": "image" },
"title": { "type": "string" },
"price": { "type": "number" }
},
"required": ["image", "title"]
},
"bkt_us_products": {
"properties": {
"image": { "type": "image" },
"title": { "type": "string" },
"price": { "type": "number" }
},
"required": ["image", "title"]
}
}NOT REQUIRED (auto-computed). Lineage chain showing complete processing history. Each entry contains: source_config, feature_extractor, output_schema for one tier. Length indicates processing depth (1 = tier 1, 2 = tier 2, etc.). Use for: Understanding multi-tier pipelines, visualizing decomposition trees.
[][
{
"output_schema": {},
"source_config": {
"bucket_id": "bkt_videos",
"type": "bucket"
}
}
]NOT REQUIRED (auto-computed from extractor). Vector indexes for semantic search. Populated from feature_extractor.required_vector_indexes. Defines: Which embeddings are indexed, dimensions, distance metrics. Use for: Understanding search capabilities, debugging vector queries.
[][
{
"distance": "cosine",
"name": "text_extractor_v1_embedding",
"size": 1024
}
]NOT REQUIRED (auto-computed from extractor + namespace). Payload indexes for filtering. Enables efficient filtering on metadata fields, timestamps, IDs. Populated from: extractor requirements + namespace defaults. Use for: Understanding which fields support fast filtering.
[][
{
"field_name": "collection_id",
"type": "keyword"
}
]NOT REQUIRED (defaults to True). Whether the collection accepts new documents. False: Collection exists but won't process new objects. True: Active and processing. Use for: Temporarily disabling collections without deletion.
true
false
NOT REQUIRED. Additional user-defined metadata for the collection. Arbitrary key-value pairs for custom organization, tracking, configuration. Not used by the platform - purely for user purposes. Common uses: team ownership, project tags, deployment environment.
{
"environment": "production",
"project": "Q4_campaign",
"team": "data-science"
}null
Timestamp when the collection was created. Automatically set by the system when the collection is first saved to the database.
Timestamp when the collection was last updated. Automatically updated by the system whenever the collection is modified.
Number of documents in the collection
NOT REQUIRED. List of taxonomies to apply to documents in this collection. Each entry specifies: taxonomy_id, execution_mode (materialize/on_demand), optional filters. Materialized: Enrichments persisted to documents during ingestion. On-demand: Enrichments computed during retrieval (via taxonomy_join stages). Empty/null if no taxonomies attached. Use for: Categorization, hierarchical classification.
null
[
{
"execution_mode": "on_demand",
"taxonomy_id": "tax_categories"
},
{
"execution_mode": "materialize",
"taxonomy_id": "tax_brands"
}
]Number of taxonomies connected to this collection
Number of retrievers connected to this collection

