Skip to main content
POST
/
v1
/
collections
/
{collection_identifier}
/
apply-taxonomy
Apply Taxonomy to Existing Documents
curl --request POST \
  --url https://api.mixpeek.com/v1/collections/{collection_identifier}/apply-taxonomy \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --header 'X-Namespace: <x-namespace>' \
  --data '
{
  "taxonomy_id": "<string>",
  "scroll_filters": {
    "must": [
      {
        "key": "metadata.category",
        "match": {
          "value": "products"
        }
      }
    ]
  },
  "batch_size": 1000,
  "parallelism": 4
}
'
{
  "task_id": "<string>",
  "status": "<string>",
  "collection_id": "<string>",
  "taxonomy_id": "<string>",
  "estimated_documents": 123
}

Headers

Authorization
string
required

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

X-Namespace
string
required

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Path Parameters

collection_identifier
string
required

Collection ID or name to apply taxonomy to

Body

application/json

Request to apply a taxonomy to an existing collection.

This endpoint triggers retroactive taxonomy materialization on all documents in a collection using distributed Ray processing.

Use Cases: - Apply taxonomy to documents that were ingested before the taxonomy was created - Re-apply taxonomy after taxonomy configuration changes - Backfill enrichment data for existing collections

Requirements: - taxonomy_id: REQUIRED - Must be an existing, valid taxonomy - The taxonomy must already be attached to the collection via taxonomy_applications - Documents must exist in the collection

taxonomy_id
string
required

ID of the taxonomy to apply. REQUIRED. Must be an existing taxonomy (tax_*). The taxonomy must already be in the collection's taxonomy_applications list.

scroll_filters
Scroll Filters · object

Optional Qdrant filters to limit which documents are enriched. NOT REQUIRED. If not provided, all documents in the collection will be enriched. Use to process specific subsets (e.g., documents missing enrichment).

Example:
{
"must": [
{
"key": "metadata.category",
"match": { "value": "products" }
}
]
}
batch_size
integer
default:1000

Number of documents to process in each parallel batch. NOT REQUIRED. Defaults to 1000. Larger batches = fewer Ray tasks but more memory per task. Smaller batches = more Ray tasks but lower memory per task.

Required range: 100 <= x <= 5000
parallelism
integer
default:4

Number of parallel Ray workers to use for processing. NOT REQUIRED. Defaults to 4. Higher parallelism = faster processing but more cluster resources. Set based on available Ray cluster capacity.

Required range: 1 <= x <= 20

Response

Successful Response

Response from applying taxonomy to collection.

Returns statistics about the materialization process.

task_id
string
required

ID of the Ray task executing the materialization

status
string
required

Status of the materialization task

collection_id
string
required

Collection ID where taxonomy is being applied

taxonomy_id
string
required

Taxonomy ID being applied

estimated_documents
integer | null

Estimated number of documents to process (if available)