Taxonomies specify information you want to add to each document. For each taxonomy field, you choose how it is populated (usually by running a retriever that searches a reference collection). During enrichment, Mixpeek finds the best match and writes the selected fields from the reference document into the target document or the retrieval result.

Overview

Flat vs Hierarchical

Single‑level joins or multi‑level trees with inheritance

Explicit & Implicit

Define nodes manually or infer structure (schema/cluster/LLM)

Execution Modes

Materialized (batch) or On‑Demand (query‑time)

Retriever‑powered

Matching is driven by a configured retriever and input mappings

Quick start

1

Create

Define a flat or hierarchical taxonomy with retriever and input mappings
2

Attach

  • Materialize: add to a collection via taxonomy_applications
  • On‑Demand: add a taxonomy join stage in a retriever
3

Run

  • Preview with Execute Taxonomy (on‑demand test)
  • Or execute the retriever pipeline in production
4

Inspect

Verify enriched fields on documents or in results

What gets copied

  • Selected enrichment fields are copied from the matched reference document into your target document or into the retrieval result, depending on execution mode.
  • Use merge_mode: "replace" to overwrite a scalar or object field on the target document.
  • Use merge_mode: "append" to add values to an array field on the target document.

How it works

1

Define

Create a taxonomy (flat or hierarchical) and specify a retriever plus input mappings
2

Attach

Add to a collection (materialize) or a retriever stage (on‑demand)
3

Execute

Run batch enrichment after extraction, or enrich at query‑time during retrieval

Types and modes

  • One source collection
  • 1:1 join semantics (best match)
  • Copy configured enrichment fields from the reference document into the target document (replace or append)

Create a taxonomy

  • API: Create Taxonomy
  • Method: POST
  • Path: /v1/taxonomies
  • Reference: API Reference

Flat example

curl -X POST https://api.mixpeek.com/v1/taxonomies \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Namespace: ns_123" \
  -H "Content-Type: application/json" \
  -d '{
    "taxonomy_name": "brands_en",
    "description": "Enrich documents with brand metadata",
    "config": {
      "taxonomy_type": "flat",
      "retriever_id": "ret_clip_v1",
      "input_mappings": [
        {"input_key": "image_vector", "path": "features.clip_vit_l_14", "source_type": "vector"}
      ],
      "source_collection": {
        "collection_id": "col_brands_v1",
        "enrichment_fields": [
          {"field_path": "metadata.brand", "merge_mode": "replace"},
          {"field_path": "metadata.tags",  "merge_mode": "append"}
        ]
      }
    }
  }'

Hierarchical (explicit) example

curl -X POST https://api.mixpeek.com/v1/taxonomies \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Namespace: ns_123" \
  -H "Content-Type: application/json" \
  -d '{
    "taxonomy_name": "people_hierarchy_v1",
    "description": "Employees → Executives",
    "config": {
      "taxonomy_type": "hierarchical",
      "retriever_id": "ret_face_v1",
      "input_mappings": [
        {"input_key": "face_vec", "path": "features.face", "source_type": "vector"}
      ],
      "hierarchical_nodes": [
        {"collection_id": "col_employees_v1"},
        {"collection_id": "col_executives_v1", "parent_collection_id": "col_employees_v1"}
      ]
    }
  }'

Manage taxonomies

Versions

  • Immutable snapshots: Create versioned snapshots for reproducible enrichment
  • Manage: [Create Version]/api-reference/taxonomies/create-taxonomy-version, [List Versions]/api-reference/taxonomies/list-taxonomy-versions, [Get Taxonomy]/api-reference/taxonomies/get-taxonomy
  • Usage: Pin a specific version when attaching to collections or executing on‑demand to ensure stable outputs; update the reference to roll forward or roll back as needed

Attach and execute

Attach to a collection (materialize)

Include taxonomy_applications when creating or updating a collection. The engine materializes enrichment after extraction completes.
{
  "collection_name": "ads_v2",
  "taxonomy_applications": [
    {
      "taxonomy_id": "tax_brands_en",
      "execution_mode": "materialize",
      "target_collection_id": "col_ads_enriched"
    },
    {
      "taxonomy_id": "tax_people_hierarchy_v1",
      "execution_mode": "on_demand"
    }
  ]
}

Execute on‑demand (test/preview)

Use the execute endpoint to validate configuration and preview enrichment (on‑demand only).
  • API: Execute Taxonomy
  • Method: POST
  • Path: /v1/taxonomies/execute/{taxonomy_identifier}
  • Reference: API Reference
curl -X POST https://api.mixpeek.com/v1/taxonomies/execute/tax_brands_en \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Namespace: ns_123" \
  -H "Content-Type: application/json" \
  -d '{
    "source_documents": [{"document_id": "doc_123", "features": {"clip_vit_l_14": [0.1,0.2,0.3]}}]
  }'
For production on‑demand usage, add a taxonomy join stage inside a retriever and call the retriever Execute API.

Best practices

1

Keep mappings precise

Ensure input mappings point to existing fields and types in target documents
2

Minimize copied fields

Copy only required enrichment fields; prefer append for arrays
3

Choose the right mode

Use materialize for stable, high‑QPS paths; on‑demand for dynamic or exploratory flows
4

Iterate hierarchies

Start with explicit nodes, then add inference or overrides as needed

FAQ

See also