Taxonomy Types
| Type | Structure | When to Use |
|---|---|---|
| Flat | Single-level reference collection | Face enrollment, entity linking, simple lookups |
| Hierarchical | Parent/child nodes with inheritance | Org charts, product categories, multi-level labeling |
Execution Modes
| Mode | Description | Use Case |
|---|---|---|
on_demand | Enrich documents at query time inside a retriever (taxonomy@v1 stage) | Exploratory workflows, testing, dynamic reference data |
materialize | Batch enrichment after extraction; results persisted in the collection | Production search, low-latency retrieval, analytics |
taxonomy_applications array or by adding a taxonomy stage to a retriever.
Internals: JOIN Stage
Taxonomies reuse thejoin@v1 stage under the hood:
- Direct join – key-based match (
join_type: "direct"). - Retriever join – similarity match using a nested retriever (
join_type: "retriever"). - Join strategies –
replace,enrich,left, orappendcontrol how fields merge.
asyncio.gather) makes retrieval joins 10–50× faster than sequential lookups.
Create a Flat Taxonomy
Create a Hierarchical Taxonomy
Attach to a Collection
- Materialized enrichment updates documents ~30 seconds after ingestion completes (debounced to avoid thrashing).
- On-demand enrichment keeps documents untouched; retrievers call the taxonomy join at query time.
Test On Demand
Inference Strategies
- Manual – Define nodes explicitly (IDs, collections, retrievers).
- Schema-based – Infer nodes from existing collection schemas (planned).
- Cluster-based – Create nodes from clustering output.
- LLM-based – Generate hierarchical structure from sample documents.
Monitoring
- List taxonomies:
POST /v1/taxonomies/list - Inspect hierarchy and node metadata:
GET /v1/taxonomies/{id}?expand_nodes=true - Track materialized enrichment progress via webhook events (
collection.documents.written) - Use retriever analytics to ensure taxonomy stages don’t dominate latency.
Best Practices
- Start flat for quick wins; layer hierarchies once value is proven.
- Keep enrichment minimal—copy only fields needed at query time.
- Cache taxonomy stages in retrievers when reference collections rarely change.
- Version taxonomies (via snapshots) before major structural changes.
- Combine with clusters to discover candidate nodes and measure coverage.

