Skip to main content
POST
/
v1
/
collections
Create Collection
curl --request POST \
  --url https://api.mixpeek.com/v1/collections \
  --header 'Content-Type: application/json' \
  --data '{
  "collection_name": "product_embeddings",
  "description": "Generate embeddings from product images and titles",
  "enabled": true,
  "feature_extractors": [
    {
      "feature_extractor_id": "openai_clip_image",
      "input_mappings": {
        "image": "image"
      },
      "parameters": {
        "model": "clip-vit-base-patch32"
      }
    },
    {
      "feature_extractor_id": "openai_embed_text",
      "input_mappings": {
        "text": "metadata.title"
      },
      "parameters": {
        "model": "text-embedding-3-small"
      }
    }
  ],
  "source": {
    "bucket_id": "bkt_12345",
    "type": "bucket"
  }
}'
{
  "collection_name": "products",
  "description": "Product catalog",
  "taxonomy_applications": [
    {
      "execution_mode": "on_demand",
      "taxonomy_id": "tax_categories"
    },
    {
      "execution_mode": "materialize",
      "target_collection_id": "col_products_enriched",
      "taxonomy_id": "tax_brands"
    }
  ]
}

Headers

Authorization
string | null

Bearer token authentication using your API key. Format: 'Bearer your_api_key'. To get an API key, create an account at mixpeek.com/start and generate a key in your account settings. Example: 'Bearer sk_1234567890abcdef'

X-Namespace
string | null

Optional namespace for data isolation. This can be a namespace name or namespace ID. Example: 'netflix_prod' or 'ns_1234567890'. To create a namespace, use the /namespaces endpoint.

Body

application/json

Request model for creating a new collection.

Collections process data from buckets or other collections using feature extractors.

CRITICAL: To use input_mappings in feature_extractors:

  1. Your source bucket MUST have a bucket_schema defined
  2. The input_mappings reference fields from that bucket_schema
  3. The system validates that mapped fields exist in the source schema

Example workflow:

  1. Create bucket with schema: { "properties": { "image": {"type": "image"}, "metadata": {...} } }
  2. Upload objects conforming to that schema
  3. Create collection with input_mappings: { "image": "image", "text": "metadata.title" }
  4. The system validates "image" and "metadata.title" exist in the bucket schema

Without a bucket_schema, input_mappings will fail with: "The source field 'X' does not exist in the source schema."

collection_name
string
required

Name of the collection to create

source
object
required

Source configuration (bucket or collection) for this collection

description
string | null

Description of the collection

input_schema
object | null

Input schema for the collection. If not provided, inferred from source bucket's bucket_schema or source collection's output_schema. REQUIRED for input_mappings to work - defines what fields can be mapped to feature extractors. Schema definition for bucket objects.

IMPORTANT: The bucket schema defines what fields your bucket objects will have. This schema is REQUIRED if you want to:

  1. Create collections that use input_mappings to process your bucket data
  2. Validate object structure before ingestion
  3. Enable type-safe data pipelines

The schema defines the custom fields that will be used in:

  • Blob properties (e.g., "content", "thumbnail", "transcript")
  • Object metadata structure
  • Blob data structures

Example workflow:

  1. Create bucket WITH schema defining your data structure
  2. Upload objects that conform to that schema
  3. Create collections that map schema fields to feature extractors

Without a bucket_schema, collections cannot use input_mappings.

feature_extractors
FeatureExtractorConfig · object[]

Feature extractors to apply. Use input_mappings in each extractor to map source schema fields to extractor inputs. Example: {'image': 'product_image', 'text': 'metadata.title'}

enabled
boolean
default:true

Whether the collection is enabled

metadata
object | null

Additional metadata for the collection

Response

Successful Response

Response model for collection endpoints.

collection_name
string
required

Collection name

collection_id
string

Unique collection identifier

description
string | null

Collection description

schema
object | null

Collection schema

input_schema
object | null

Input schema for the collection Schema definition for bucket objects.

IMPORTANT: The bucket schema defines what fields your bucket objects will have. This schema is REQUIRED if you want to:

  1. Create collections that use input_mappings to process your bucket data
  2. Validate object structure before ingestion
  3. Enable type-safe data pipelines

The schema defines the custom fields that will be used in:

  • Blob properties (e.g., "content", "thumbnail", "transcript")
  • Object metadata structure
  • Blob data structures

Example workflow:

  1. Create bucket WITH schema defining your data structure
  2. Upload objects that conform to that schema
  3. Create collections that map schema fields to feature extractors

Without a bucket_schema, collections cannot use input_mappings.

output_schema
object | null

Output schema after feature extraction Schema definition for bucket objects.

IMPORTANT: The bucket schema defines what fields your bucket objects will have. This schema is REQUIRED if you want to:

  1. Create collections that use input_mappings to process your bucket data
  2. Validate object structure before ingestion
  3. Enable type-safe data pipelines

The schema defines the custom fields that will be used in:

  • Blob properties (e.g., "content", "thumbnail", "transcript")
  • Object metadata structure
  • Blob data structures

Example workflow:

  1. Create bucket WITH schema defining your data structure
  2. Upload objects that conform to that schema
  3. Create collections that map schema fields to feature extractors

Without a bucket_schema, collections cannot use input_mappings.

feature_extractors
FeatureExtractorConfig · object[]

Feature extractors applied to this collection

source_lineage
SingleLineageEntry · object[] | null

Lineage chain showing the processing history

vector_indexes
any[]

Vector indexes for this collection

payload_indexes
any[]

Payload indexes for this collection

enabled
boolean
default:true

Whether the collection is enabled

metadata
object | null

Additional metadata for the collection

taxonomy_applications
TaxonomyApplicationConfig · object[] | null

List of taxonomies applied to this collection

I