Collections are where processed documents live. They define how outputs are stored after feature extraction and serve as the main retrieval surface.

Overview

  • What they store: Structured documents produced by feature extractors.
  • Inputs: Typically ingest from a bucket (raw objects) or another collection.
  • Consistency: All documents in a collection share a schema determined by configured feature extractors.
  • Retrieval: Retrievers query one or more collections to return ranked results.

Collection model

Minimum fields when creating a collection. See the endpoint at Create Collection.
{
  "collection_name": "products_v1",
  "description": "Product catalog (images + text)",
  "source": {"type": "BUCKET", "bucket_id": "bkt_123"},
  "feature_extractors": [
    {
      "feature_extractor_name": "gte_modernbert_base",
      "version": "1.0.0"
    }
  ],
  "metadata": {"env": "prod"}
}
  • source.type: BUCKET or COLLECTION.
  • feature_extractors: Which models generate vectors/fields for documents.
  • metadata: User-defined metadata attached to the collection record.

Create a collection

  • API: Create Collection
  • Method: POST
  • Path: /v1/collections
  • Reference: API Reference
curl -X POST https://api.mixpeek.com/v1/collections \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Namespace: ns_123" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_name": "products_v1",
    "description": "Product catalog (images + text)",
    "source": {"type": "BUCKET", "bucket_id": "bkt_123"},
    "feature_extractors": [
      {"feature_extractor_name": "gte_modernbert_base", "version": "1.0.0"}
    ]
  }'

Describe available features

List configured feature addresses and metadata for a collection.
  • API: Describe Collection Features
  • Method: GET
  • Path: /v1/collections/{collection_identifier}/features
  • Reference: API Reference

Manage collections

Documents in collections

Documents are created by pipelines and stored in the target collection. Use Collection Documents APIs to read/update.

Used by

  • Retrievers: Query collections for search results.
  • Taxonomies: Enrich documents using similarity joins; configured against collection IDs.
  • Clusters: Build and analyze clusters over vectors from one or more collections.

Behavior & validation

  • Schema: Determined by feature extractor outputs; documents must conform.
  • Lineage: Documents maintain references to the source object and processing metadata.
  • Enablement: Collections can be enabled/disabled via the update endpoint.

See also