Create Collection

Authorizations

Authorization

string

header

required

Bearer token authentication using your API key. Format: 'Bearer your_api_key'. To get an API key, create an account at mixpeek.com/start and generate a key in your account settings.

Headers

Authorization

string

required

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer sk_live_abc123def456"

"Bearer sk_test_xyz789"

X-Namespace

string

required

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Body

application/json

Request model for creating a new collection.

Collections process data from buckets or other collections using a single feature extractor.

KEY ARCHITECTURAL CHANGE: Each collection has EXACTLY ONE feature extractor.

Use field_passthrough to include additional source fields in output documents
Multiple extractors = multiple collections
This simplifies processing and makes output schema deterministic

CRITICAL: To use input_mappings:

Your source bucket MUST have a bucket_schema defined
The input_mappings reference fields from that bucket_schema
The system validates that mapped fields exist in the source schema

Example workflow:

Create bucket with schema: { "properties": { "image": {"type": "image"}, "title": {"type": "string"} } }
Upload objects conforming to that schema
Create collection with:
- input_mappings: { "image": "image" }
- field_passthrough: [{"source_path": "title"}]
Output documents will have both extractor outputs AND passthrough fields

Schema Computation:

output_schema is computed IMMEDIATELY when collection is created
output_schema = field_passthrough fields + extractor output fields
No waiting for documents to be processed!

collection_name

string

required

Name of the collection to create

Examples:

"video_frames"

"product_embeddings"

source

object

required

Source configuration (bucket or collection) for this collection

Show child attributes

Examples:

{
  "bucket_ids": ["bkt_marketing_videos"],
  "description": "Single bucket source",
  "type": "bucket"
}

{
  "bucket_ids": ["bkt_marketing_ads"],
  "description": "Bucket source with video-only filter",
  "source_filters": {
    "filters": {
      "AND": [
        {
          "field": "blobs.type",
          "operator": "eq",
          "value": "video"
        }
      ]
    }
  },
  "type": "bucket"
}

{
  "bucket_ids": [
    "bkt_us_products",
    "bkt_eu_products",
    "bkt_asia_products"
  ],
  "description": "Multi-bucket source (region consolidation)",
  "type": "bucket"
}

{
  "bucket_ids": ["bkt_us_ads", "bkt_eu_ads", "bkt_asia_ads"],
  "description": "Multi-bucket source with brand filter",
  "source_filters": {
    "filters": {
      "OR": [
        {
          "field": "brand_name",
          "operator": "in",
          "value": ["Acme", "TechCo"]
        },
        {
          "field": "category",
          "operator": "eq",
          "value": "premium"
        }
      ]
    }
  },
  "type": "bucket"
}

{
  "collection_id": "col_video_frames",
  "description": "Collection source (decomposition tree)",
  "type": "collection"
}

{
  "collection_id": "col_raw_frames",
  "description": "Collection source with active documents only",
  "source_filters": {
    "filters": {
      "AND": [
        {
          "field": "__fully_enriched",
          "operator": "eq",
          "value": true
        },
        {
          "field": "quality_score",
          "operator": "gte",
          "value": 0.8
        }
      ]
    }
  },
  "type": "collection"
}

feature_extractor

object

required

Single feature extractor for this collection. Use field_passthrough within the extractor config to include additional source fields. For multiple extractors, create multiple collections and use collection-to-collection pipelines.

Show child attributes

Examples:

{
  "description": "Text extractor with field passthrough",
  "feature_extractor_name": "text_extractor",
  "field_passthrough": [
    { "required": true, "source_path": "title" },
    { "source_path": "author" }
  ],
  "input_mappings": { "text": "content" },
  "parameters": { "model": "text-embedding-3-small" },
  "version": "v1"
}

{
  "description": "Video extractor with campaign metadata",
  "feature_extractor_name": "multimodal_extractor",
  "field_passthrough": [
    { "source_path": "campaign_id" },
    { "source_path": "duration_seconds" }
  ],
  "input_mappings": { "video": "video_url" },
  "parameters": { "fps": 1 },
  "version": "v1"
}

{
  "description": "Image extractor with all source fields",
  "feature_extractor_name": "image_extractor",
  "include_all_source_fields": true,
  "input_mappings": { "image": "product_image" },
  "parameters": { "model": "clip-vit-base-patch32" },
  "version": "v1"
}

{
  "description": "Text extractor with field renaming",
  "feature_extractor_name": "text_extractor",
  "field_passthrough": [
    {
      "source_path": "metadata.author",
      "target_path": "author"
    },
    {
      "source_path": "metadata.created_at",
      "target_path": "created_at"
    }
  ],
  "input_mappings": { "text": "content" },
  "version": "v1"
}

description

string | null

Description of the collection

input_schema

object | null

Input schema for the collection. If not provided, inferred from source bucket's bucket_schema or source collection's output_schema. REQUIRED for input_mappings to work - defines what fields can be mapped to feature extractors. Schema definition for bucket objects.

IMPORTANT: The bucket schema defines what fields your bucket objects will have. This schema is REQUIRED if you want to:

Create collections that use input_mappings to process your bucket data
Validate object structure before ingestion
Enable type-safe data pipelines

The schema defines the custom fields that will be used in:

Blob properties (e.g., "content", "thumbnail", "transcript")
Object metadata structure
Blob data structures

Example workflow:

Create bucket WITH schema defining your data structure
Upload objects that conform to that schema
Create collections that map schema fields to feature extractors

Without a bucket_schema, collections cannot use input_mappings.

Show child attributes

enabled

boolean

default:true

Whether the collection is enabled

metadata

object | null

Additional metadata for the collection

taxonomy_applications

TaxonomyApplicationConfig · object[] | null

Optional taxonomy applications to automatically enrich documents in this collection. Each taxonomy will classify/enrich documents based on configured retriever matches.

Show child attributes

Response

Successful Response

Response model for collection endpoints.

collection_name

string

required

REQUIRED. Human-readable name for the collection. Must be unique within the namespace. Used for: Display, lookups (can query by name or ID), organization. Format: Alphanumeric with underscores/hyphens, 3-100 characters. Examples: 'product_embeddings', 'video_frames', 'customer_documents'.

Required string length: 3 - 100

Examples:

"video_frames"

"product_embeddings"

"customer_documents"

input_schema

object

required

REQUIRED (auto-computed from source). Input schema defining fields available to the feature extractor. Source: bucket.bucket_schema (if source.type='bucket') OR upstream_collection.output_schema (if source.type='collection'). Determines: Which fields can be used in input_mappings and field_passthrough. This is the 'left side' of the transformation - what data goes IN. Format: BucketSchema with properties dict. Use for: Validating input_mappings, configuring field_passthrough.

Show child attributes

output_schema

object

required

REQUIRED (auto-computed at creation). Output schema defining fields in final documents. Computed as: field_passthrough fields + extractor output fields (deterministic). Known IMMEDIATELY when collection is created - no waiting for documents! This is the 'right side' of the transformation - what data comes OUT. Use for: Understanding document structure, building queries, schema validation. Example: {title (passthrough), embedding (extractor output)} = output_schema.

Show child attributes

feature_extractor

object

required

REQUIRED. Single feature extractor configuration for this collection. Defines: extractor name/version, input_mappings, field_passthrough, parameters. Task 9 change: ONE extractor per collection (previously supported multiple). For multiple extractors: Create multiple collections and use collection-to-collection pipelines. Use field_passthrough to include additional source fields beyond extractor outputs.

Show child attributes

Examples:

{
  "description": "Text extractor with field passthrough",
  "feature_extractor_name": "text_extractor",
  "field_passthrough": [
    { "required": true, "source_path": "title" },
    { "source_path": "author" }
  ],
  "input_mappings": { "text": "content" },
  "parameters": { "model": "text-embedding-3-small" },
  "version": "v1"
}

{
  "description": "Video extractor with campaign metadata",
  "feature_extractor_name": "multimodal_extractor",
  "field_passthrough": [
    { "source_path": "campaign_id" },
    { "source_path": "duration_seconds" }
  ],
  "input_mappings": { "video": "video_url" },
  "parameters": { "fps": 1 },
  "version": "v1"
}

{
  "description": "Image extractor with all source fields",
  "feature_extractor_name": "image_extractor",
  "include_all_source_fields": true,
  "input_mappings": { "image": "product_image" },
  "parameters": { "model": "clip-vit-base-patch32" },
  "version": "v1"
}

{
  "description": "Text extractor with field renaming",
  "feature_extractor_name": "text_extractor",
  "field_passthrough": [
    {
      "source_path": "metadata.author",
      "target_path": "author"
    },
    {
      "source_path": "metadata.created_at",
      "target_path": "created_at"
    }
  ],
  "input_mappings": { "text": "content" },
  "version": "v1"
}

source

object

required

REQUIRED. Source configuration defining where data comes from. Type 'bucket': Process objects from one or more buckets (tier 1). Type 'collection': Process documents from upstream collection (tier 2+). For multi-bucket sources, all buckets must have compatible schemas. Determines input_schema and enables decomposition trees.

Show child attributes

Examples:

{
  "bucket_ids": ["bkt_marketing_videos"],
  "description": "Single bucket source",
  "type": "bucket"
}

{
  "bucket_ids": ["bkt_marketing_ads"],
  "description": "Bucket source with video-only filter",
  "source_filters": {
    "filters": {
      "AND": [
        {
          "field": "blobs.type",
          "operator": "eq",
          "value": "video"
        }
      ]
    }
  },
  "type": "bucket"
}

{
  "bucket_ids": [
    "bkt_us_products",
    "bkt_eu_products",
    "bkt_asia_products"
  ],
  "description": "Multi-bucket source (region consolidation)",
  "type": "bucket"
}

{
  "bucket_ids": ["bkt_us_ads", "bkt_eu_ads", "bkt_asia_ads"],
  "description": "Multi-bucket source with brand filter",
  "source_filters": {
    "filters": {
      "OR": [
        {
          "field": "brand_name",
          "operator": "in",
          "value": ["Acme", "TechCo"]
        },
        {
          "field": "category",
          "operator": "eq",
          "value": "premium"
        }
      ]
    }
  },
  "type": "bucket"
}

{
  "collection_id": "col_video_frames",
  "description": "Collection source (decomposition tree)",
  "type": "collection"
}

{
  "collection_id": "col_raw_frames",
  "description": "Collection source with active documents only",
  "source_filters": {
    "filters": {
      "AND": [
        {
          "field": "__fully_enriched",
          "operator": "eq",
          "value": true
        },
        {
          "field": "quality_score",
          "operator": "gte",
          "value": 0.8
        }
      ]
    }
  },
  "type": "collection"
}

collection_id

string

NOT REQUIRED (auto-generated). Unique identifier for this collection. Used for: API paths, document queries, pipeline references. Format: 'col_' prefix + 10 random alphanumeric characters. Stable after creation - use for all collection references.

Examples:

"col_a1b2c3d4e5"

"col_xyz789abc"

"col_video_frames"

description

string | null

NOT REQUIRED. Human-readable description of the collection's purpose. Use for: Documentation, team communication, UI display. Common pattern: Describe what the collection contains and what processing is applied.

Examples:

"Video frames extracted at 1 FPS with CLIP embeddings"

"Product catalog with image embeddings and metadata"

source_bucket_schemas

object | null

NOT REQUIRED (auto-computed). Snapshot of bucket schemas at collection creation. Only populated for multi-bucket collections (source.type='bucket' with multiple bucket_ids). Key: bucket_id, Value: BucketSchema at time of collection creation. Used for: Schema compatibility validation, document lineage, debugging. Schema snapshot is immutable - bucket schema changes after collection creation do not affect this. Single-bucket collections may omit this field (schema in input_schema is sufficient).

Show child attributes

Examples:

null

{
  "bkt_eu_products": {
    "properties": {
      "image": { "type": "image" },
      "title": { "type": "string" },
      "price": { "type": "number" }
    },
    "required": ["image", "title"]
  },
  "bkt_us_products": {
    "properties": {
      "image": { "type": "image" },
      "title": { "type": "string" },
      "price": { "type": "number" }
    },
    "required": ["image", "title"]
  }
}

source_lineage

SingleLineageEntry · object[]

NOT REQUIRED (auto-computed). Lineage chain showing complete processing history. Each entry contains: source_config, feature_extractor, output_schema for one tier. Length indicates processing depth (1 = tier 1, 2 = tier 2, etc.). Use for: Understanding multi-tier pipelines, visualizing decomposition trees.

Show child attributes

Examples:

[]

[
  {
    "output_schema": {},
    "source_config": {
      "bucket_id": "bkt_videos",
      "type": "bucket"
    }
  }
]

vector_indexes

any[]

NOT REQUIRED (auto-computed from extractor). Vector indexes for semantic search. Populated from feature_extractor.required_vector_indexes. Defines: Which embeddings are indexed, dimensions, distance metrics. Use for: Understanding search capabilities, debugging vector queries.

Examples:

[]

[
  {
    "distance": "cosine",
    "name": "text_extractor_v1_embedding",
    "size": 1024
  }
]

payload_indexes

any[]

NOT REQUIRED (auto-computed from extractor + namespace). Payload indexes for filtering. Enables efficient filtering on metadata fields, timestamps, IDs. Populated from: extractor requirements + namespace defaults. Use for: Understanding which fields support fast filtering.

Examples:

[]

[
  {
    "field_name": "collection_id",
    "type": "keyword"
  }
]

enabled

boolean

default:true

NOT REQUIRED (defaults to True). Whether the collection accepts new documents. False: Collection exists but won't process new objects. True: Active and processing. Use for: Temporarily disabling collections without deletion.

Examples:

true

false

metadata

object | null

NOT REQUIRED. Additional user-defined metadata for the collection. Arbitrary key-value pairs for custom organization, tracking, configuration. Not used by the platform - purely for user purposes. Common uses: team ownership, project tags, deployment environment.

Examples:

{
  "environment": "production",
  "project": "Q4_campaign",
  "team": "data-science"
}

null

created_at

string<date-time> | null

Timestamp when the collection was created. Automatically set by the system when the collection is first saved to the database.

updated_at

string<date-time> | null

Timestamp when the collection was last updated. Automatically updated by the system whenever the collection is modified.

document_count

integer | null

Number of documents in the collection

taxonomy_applications

TaxonomyApplicationConfig · object[] | null

NOT REQUIRED. List of taxonomies to apply to documents in this collection. Each entry specifies: taxonomy_id, execution_mode (materialize/on_demand), optional filters. Materialized: Enrichments persisted to documents during ingestion. On-demand: Enrichments computed during retrieval (via taxonomy_join stages). Empty/null if no taxonomies attached. Use for: Categorization, hierarchical classification.

Show child attributes

Examples:

null

[
  {
    "execution_mode": "on_demand",
    "taxonomy_id": "tax_categories"
  },
  {
    "execution_mode": "materialize",
    "taxonomy_id": "tax_brands"
  }
]

taxonomy_count

integer | null

Number of taxonomies connected to this collection

retriever_count

integer | null

Number of retrievers connected to this collection

Health

Organizations

Namespaces

Buckets

Feature Extractors

Collections

Retrievers

Taxonomies

Clusters

Analytics

Tasks

Webhooks

Create Collection

Authorizations

Headers

Body

Response