Skip to main content
GET
/
v1
/
collections
/
{collection_identifier}
Get Collection
curl --request GET \
  --url https://api.mixpeek.com/v1/collections/{collection_identifier} \
  --header 'Authorization: <authorization>' \
  --header 'X-Namespace: <x-namespace>'
{
  "collection_name": "<string>",
  "input_schema": {
    "properties": {}
  },
  "output_schema": {
    "properties": {}
  },
  "feature_extractor": {
    "feature_extractor_name": "<string>",
    "version": "<string>",
    "feature_extractor_id": "<string>",
    "parameters": {},
    "input_mappings": {
      "image": "product_image",
      "text": "title"
    },
    "field_passthrough": [
      {
        "source_path": "<string>",
        "target_path": "title",
        "default": "Unknown",
        "required": false
      }
    ],
    "include_all_source_fields": false
  },
  "source": {
    "type": "bucket",
    "bucket_ids": [
      "bkt_marketing_videos"
    ],
    "collection_id": "col_video_frames",
    "collection_ids": [
      "col_us_products",
      "col_eu_products"
    ],
    "inherited_bucket_ids": [
      "bkt_marketing_videos"
    ],
    "source_filters": {
      "description": "Filter only video content",
      "filters": {
        "AND": [
          {
            "field": "blobs.type",
            "operator": "eq",
            "value": "video"
          }
        ]
      }
    }
  },
  "collection_id": "<string>",
  "description": "Video frames extracted at 1 FPS with CLIP embeddings",
  "source_bucket_schemas": null,
  "source_lineage": [
    {
      "source_config": {
        "type": "bucket",
        "bucket_ids": [
          "bkt_marketing_videos"
        ],
        "collection_id": "col_video_frames",
        "collection_ids": [
          "col_us_products",
          "col_eu_products"
        ],
        "inherited_bucket_ids": [
          "bkt_marketing_videos"
        ],
        "source_filters": {
          "description": "Filter only video content",
          "filters": {
            "AND": [
              {
                "field": "blobs.type",
                "operator": "eq",
                "value": "video"
              }
            ]
          }
        }
      },
      "feature_extractor": {
        "feature_extractor_name": "<string>",
        "version": "<string>",
        "feature_extractor_id": "<string>",
        "parameters": {},
        "input_mappings": {
          "image": "product_image",
          "text": "title"
        },
        "field_passthrough": [
          {
            "source_path": "<string>",
            "target_path": "title",
            "default": "Unknown",
            "required": false
          }
        ],
        "include_all_source_fields": false
      },
      "output_schema": {
        "properties": {}
      }
    }
  ],
  "vector_indexes": [
    "<unknown>"
  ],
  "payload_indexes": [
    "<unknown>"
  ],
  "enabled": true,
  "metadata": {
    "environment": "production",
    "project": "Q4_campaign",
    "team": "data-science"
  },
  "created_at": "2023-11-07T05:31:56Z",
  "updated_at": "2023-11-07T05:31:56Z",
  "document_count": 123,
  "schema_version": 1,
  "last_schema_sync": "2023-11-07T05:31:56Z",
  "schema_sync_enabled": true,
  "taxonomy_applications": null,
  "cluster_applications": null,
  "taxonomy_count": 123,
  "retriever_count": 123
}

Headers

Authorization
string
required

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

X-Namespace
string
required

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Path Parameters

collection_identifier
string
required

The ID or name of the collection to retrieve

Response

Successful Response

Response model for collection endpoints.

collection_name
string
required

REQUIRED. Human-readable name for the collection. Must be unique within the namespace. Used for: Display, lookups (can query by name or ID), organization. Format: Alphanumeric with underscores/hyphens, 3-100 characters. Examples: 'product_embeddings', 'video_frames', 'customer_documents'.

Required string length: 3 - 100
input_schema
BucketSchema · object
required

REQUIRED (auto-computed from source). Input schema defining fields available to the feature extractor. Source: bucket.bucket_schema (if source.type='bucket') OR upstream_collection.output_schema (if source.type='collection'). Determines: Which fields can be used in input_mappings and field_passthrough. This is the 'left side' of the transformation - what data goes IN. Format: BucketSchema with properties dict. Use for: Validating input_mappings, configuring field_passthrough.

output_schema
BucketSchema · object
required

REQUIRED (auto-computed at creation). Output schema defining fields in final documents. Computed as: field_passthrough fields + extractor output fields (deterministic). Known IMMEDIATELY when collection is created - no waiting for documents! This is the 'right side' of the transformation - what data comes OUT. Use for: Understanding document structure, building queries, schema validation. Example: {title (passthrough), embedding (extractor output)} = output_schema.

feature_extractor
FeatureExtractorConfig · object
required

REQUIRED. Single feature extractor configuration for this collection. Defines: extractor name/version, input_mappings, field_passthrough, parameters. Task 9 change: ONE extractor per collection (previously supported multiple). For multiple extractors: Create multiple collections and use collection-to-collection pipelines. Use field_passthrough to include additional source fields beyond extractor outputs.

source
SourceConfig · object
required

REQUIRED. Source configuration defining where data comes from. Type 'bucket': Process objects from one or more buckets (tier 1). Type 'collection': Process documents from upstream collection(s) (tier 2+). For multi-bucket sources, all buckets must have compatible schemas. For multi-collection sources, all collections must have compatible schemas. Determines input_schema and enables decomposition trees.

collection_id
string

NOT REQUIRED (auto-generated). Unique identifier for this collection. Used for: API paths, document queries, pipeline references. Format: 'col_' prefix + 10 random alphanumeric characters. Stable after creation - use for all collection references.

description
string | null

NOT REQUIRED. Human-readable description of the collection's purpose. Use for: Documentation, team communication, UI display. Common pattern: Describe what the collection contains and what processing is applied.

Example:

"Video frames extracted at 1 FPS with CLIP embeddings"

source_bucket_schemas
Source Bucket Schemas · object

NOT REQUIRED (auto-computed). Snapshot of bucket schemas at collection creation. Only populated for multi-bucket collections (source.type='bucket' with multiple bucket_ids). Key: bucket_id, Value: BucketSchema at time of collection creation. Used for: Schema compatibility validation, document lineage, debugging. Schema snapshot is immutable - bucket schema changes after collection creation do not affect this. Single-bucket collections may omit this field (schema in input_schema is sufficient).

Example:

null

source_lineage
SingleLineageEntry · object[]

NOT REQUIRED (auto-computed). Lineage chain showing complete processing history. Each entry contains: source_config, feature_extractor, output_schema for one tier. Length indicates processing depth (1 = tier 1, 2 = tier 2, etc.). Use for: Understanding multi-tier pipelines, visualizing decomposition trees.

vector_indexes
any[]

NOT REQUIRED (auto-computed from extractor). Vector indexes for semantic search. Populated from feature_extractor.required_vector_indexes. Defines: Which embeddings are indexed, dimensions, distance metrics. Use for: Understanding search capabilities, debugging vector queries.

payload_indexes
any[]

NOT REQUIRED (auto-computed from extractor + namespace). Payload indexes for filtering. Enables efficient filtering on metadata fields, timestamps, IDs. Populated from: extractor requirements + namespace defaults. Use for: Understanding which fields support fast filtering.

enabled
boolean
default:true

NOT REQUIRED (defaults to True). Whether the collection accepts new documents. False: Collection exists but won't process new objects. True: Active and processing. Use for: Temporarily disabling collections without deletion.

metadata
Metadata · object

NOT REQUIRED. Additional user-defined metadata for the collection. Arbitrary key-value pairs for custom organization, tracking, configuration. Not used by the platform - purely for user purposes. Common uses: team ownership, project tags, deployment environment.

Example:
{
"environment": "production",
"project": "Q4_campaign",
"team": "data-science"
}
created_at
string<date-time> | null

Timestamp when the collection was created. Automatically set by the system when the collection is first saved to the database.

updated_at
string<date-time> | null

Timestamp when the collection was last updated. Automatically updated by the system whenever the collection is modified.

document_count
integer | null

Number of documents in the collection

schema_version
integer
default:1

Version number for the output_schema. Increments automatically when schema is updated via document sampling. Used to track schema evolution and trigger downstream collection schema updates.

last_schema_sync
string<date-time> | null

Timestamp of last automatic schema sync from document sampling. Used to debounce schema updates (prevents thrashing).

schema_sync_enabled
boolean
default:true

Whether automatic schema discovery and sync is enabled for this collection. When True, schema is periodically updated by sampling documents. When False, schema remains fixed at creation time.

taxonomy_applications
TaxonomyApplicationConfig · object[] | null

NOT REQUIRED. List of taxonomies to apply to documents in this collection. Each entry specifies: taxonomy_id, execution_mode (materialize/on_demand), optional filters. Materialized: Enrichments persisted to documents during ingestion. On-demand: Enrichments computed during retrieval (via taxonomy_join stages). Empty/null if no taxonomies attached. Use for: Categorization, hierarchical classification.

Example:

null

cluster_applications
ClusterApplicationConfig · object[] | null

NOT REQUIRED. List of clusters to automatically execute when batch processing completes. Each entry specifies: cluster_id, auto_execute_on_batch, min_document_threshold, cooldown_seconds. Clusters enrich source documents with cluster assignments (cluster_id, cluster_label, etc.). Empty/null if no clusters attached. Use for: Segmentation, grouping, pattern discovery.

Example:

null

taxonomy_count
integer | null

Number of taxonomies connected to this collection

retriever_count
integer | null

Number of retrievers connected to this collection