Taxonomy Types
| Type | Structure | When to Use |
|---|---|---|
| Flat | Single-level reference collection | Face enrollment, entity linking, simple lookups |
| Hierarchical | Parent/child nodes with inheritance | Org charts, product categories, multi-level labeling |
Flat Taxonomy: Product Catalog Recognition
In a flat taxonomy, documents from any modality (video, image, audio, text) are matched against a single reference collection. Each document uses its appropriate feature embedding (CLIP for visual, text embeddings for audio transcripts) to find the best match. Enrichment fields (SKU, category, price) are attached when similarity exceeds the threshold.Hierarchical Taxonomy: Media Content Classification
In a hierarchical taxonomy, documents traverse multiple levels of progressive refinement. Starting from a broad brand classification (1 node), through content category (2 nodes), sport/style type (4 nodes), audience segmentation (5 nodes), to specific campaigns (6 nodes). Each level narrows the classification using different multimodal features—CLIP for brand detection, scene classification for categories, activity detection for sport types, demographic models for audiences, and campaign-specific patterns at the final level. Documents inherit all properties from parent nodes as they traverse down the tree.Execution Modes
| Mode | Description | Use Case |
|---|---|---|
on_demand | Enrich documents at query time inside a retriever (taxonomy@v1 stage) | Exploratory workflows, testing, dynamic reference data |
materialize | Batch enrichment after extraction; results persisted in the collection | Production search, low-latency retrieval, analytics |
taxonomy_applications array or by adding a taxonomy stage to a retriever.
Internals: JOIN Stage
Taxonomies reuse thejoin@v1 stage under the hood:
- Direct join – key-based match (
join_type: "direct"). - Retriever join – similarity match using a nested retriever (
join_type: "retriever"). - Join strategies –
replace,enrich,left, orappendcontrol how fields merge.
asyncio.gather) makes retrieval joins 10–50× faster than sequential lookups.
Create a Flat Taxonomy
Create a Hierarchical Taxonomy
Attach to a Collection
- Materialized enrichment updates documents ~30 seconds after ingestion completes (debounced to avoid thrashing).
- On-demand enrichment keeps documents untouched; retrievers call the taxonomy join at query time.
Test On Demand
Inference Strategies
- Manual – Define nodes explicitly (IDs, collections, retrievers).
- Schema-based – Infer nodes from existing collection schemas (planned).
- Cluster-based – Create nodes from clustering output.
- LLM-based – Generate hierarchical structure from sample documents.
Monitoring
- List taxonomies:
POST /v1/taxonomies/list - Inspect hierarchy and node metadata:
GET /v1/taxonomies/{id}?expand_nodes=true - Track materialized enrichment progress via webhook events (
collection.documents.written) - Use retriever analytics to ensure taxonomy stages don’t dominate latency.
Best Practices
- Start flat for quick wins; layer hierarchies once value is proven.
- Keep enrichment minimal—copy only fields needed at query time.
- Cache taxonomy stages in retrievers when reference collections rarely change.
- Version taxonomies (via snapshots) before major structural changes.
- Combine with clusters to discover candidate nodes and measure coverage.

