Apply a taxonomy to all existing documents in a collection retroactively.
This endpoint triggers distributed Ray processing to enrich existing documents with taxonomy data. Unlike automatic materialization (which happens during ingestion), this endpoint allows you to:
⚙️ Processing Details:
⚠️ Prerequisites:
📊 Performance:
🔍 Use Cases:
See Collections API and Taxonomies API documentation for details.
REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.
REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'
Collection ID or name to apply taxonomy to
Request to apply a taxonomy to an existing collection.
This endpoint triggers retroactive taxonomy materialization on all documents in a collection using distributed Ray processing.
Use Cases: - Apply taxonomy to documents that were ingested before the taxonomy was created - Re-apply taxonomy after taxonomy configuration changes - Backfill enrichment data for existing collections
Requirements: - taxonomy_id: REQUIRED - Must be an existing, valid taxonomy - The taxonomy must already be attached to the collection via taxonomy_applications - Documents must exist in the collection
ID of the taxonomy to apply. REQUIRED. Must be an existing taxonomy (tax_*). The taxonomy must already be in the collection's taxonomy_applications list.
Optional Qdrant filters to limit which documents are enriched. NOT REQUIRED. If not provided, all documents in the collection will be enriched. Use to process specific subsets (e.g., documents missing enrichment).
{
"must": [
{
"key": "metadata.category",
"match": { "value": "products" }
}
]
}Number of documents to process in each parallel batch. NOT REQUIRED. Defaults to 1000. Larger batches = fewer Ray tasks but more memory per task. Smaller batches = more Ray tasks but lower memory per task.
100 <= x <= 5000Number of parallel Ray workers to use for processing. NOT REQUIRED. Defaults to 4. Higher parallelism = faster processing but more cluster resources. Set based on available Ray cluster capacity.
1 <= x <= 20Successful Response
Response from applying taxonomy to collection.
Returns statistics about the materialization process.
ID of the Ray task executing the materialization
Status of the materialization task
Collection ID where taxonomy is being applied
Taxonomy ID being applied
Estimated number of documents to process (if available)