Taxonomy Types
| Type | Structure | When to Use |
|---|---|---|
| Flat | Single-level reference collection | Face enrollment, entity linking, simple lookups |
| Hierarchical | Parent/child nodes with inheritance | Org charts, product categories, multi-level labeling |
Flat Taxonomy: Product Catalog Recognition
Hierarchical Taxonomy: Media Content Classification
Execution Modes
| Mode | Description | Use Case |
|---|---|---|
on_demand | Enrich documents at query time inside a retriever (taxonomy_enrich stage) | Exploratory workflows, testing, dynamic reference data |
materialize | Batch enrichment after extraction; results persisted in the collection | Production search, low-latency retrieval, analytics |
retroactive | Apply taxonomy to existing documents in a collection | Backfilling, taxonomy updates, schema migrations |
taxonomy_applications array or by adding a taxonomy stage to a retriever.
How Hierarchical Taxonomies Execute
Hierarchical taxonomies are executed like Common Table Expressions (CTEs) in SQL—each level builds on the results of the previous level, creating a recursive evaluation chain from root to leaf nodes.- Documents that matched the parent node are passed down
- The child node’s retriever executes against its reference collection
- Enrichment fields from matching nodes are accumulated
- Only documents exceeding the similarity threshold continue to child nodes
Application Methods
Hierarchical taxonomies can be applied through three methods:| Method | When It Runs | Use Case |
|---|---|---|
| On-demand | Query time, as a retriever stage | Dynamic classification, A/B testing taxonomy versions, low-volume queries |
| Materialized | During collection processing (post-extraction) | Production search requiring low latency, analytics dashboards |
| Retroactive | Manually triggered via API | Backfilling existing documents, applying updated taxonomy versions |
taxonomy_enrich stage to your retriever pipeline:
taxonomy_applications:
- You’ve updated a taxonomy and need to reclassify existing documents
- You’re migrating from a flat taxonomy to a hierarchical one
- You’ve added new reference data to taxonomy collections
Internals: JOIN Stage
Taxonomies reuse thejoin@v1 stage under the hood:
- Direct join – key-based match (
join_type: "direct"). - Retriever join – similarity match using a nested retriever (
join_type: "retriever"). - Join strategies –
replace,enrich,left, orappendcontrol how fields merge.
asyncio.gather) makes retrieval joins 10–50× faster than sequential lookups.
Create a Flat Taxonomy
Create a Hierarchical Taxonomy
Attach to a Collection
- Materialized enrichment updates documents ~30 seconds after ingestion completes (debounced to avoid thrashing).
- On-demand enrichment keeps documents untouched; retrievers call the taxonomy join at query time.
Test On Demand
Inference Strategies
- Manual – Define nodes explicitly (IDs, collections, retrievers).
- Schema-based – Infer nodes from existing collection schemas (planned).
- Cluster-based – Create nodes from clustering output.
- LLM-based – Generate hierarchical structure from sample documents.
Monitoring
- List taxonomies:
POST /v1/taxonomies/list - Inspect hierarchy and node metadata:
GET /v1/taxonomies/{id}?expand_nodes=true - Track materialized enrichment progress via webhook events (
collection.documents.written) - Use retriever analytics to ensure taxonomy stages don’t dominate latency.
Best Practices
- Start flat for quick wins; layer hierarchies once value is proven.
- Keep enrichment minimal—copy only fields needed at query time.
- Cache taxonomy stages in retrievers when reference collections rarely change.
- Version taxonomies (via snapshots) before major structural changes.
- Combine with clusters to discover candidate nodes and measure coverage.

