The Taxonomy Enrich stage classifies documents against predefined taxonomies, adding structured category labels and hierarchical classifications to your search results.
Stage Category : APPLY (Enriches documents with classifications)Transformation : N documents → N documents (with taxonomy labels added)
When to Use
Use Case Description Content categorization Auto-classify documents into topics Faceted search Add filterable category facets Compliance tagging Apply regulatory classifications Product taxonomy Classify into product hierarchies
When NOT to Use
Scenario Recommended Alternative Free-form tagging llm_enrichmentPre-classified content Skip this stage Custom classification logic api_call to custom service
Parameters
Parameter Type Default Description taxonomy_idstring Required ID of the taxonomy to use content_fieldstring contentField to classify result_fieldstring taxonomyField for classification results max_depthinteger nullMaximum hierarchy depth top_kinteger 3Number of top classifications min_confidencefloat 0.5Minimum confidence threshold include_ancestorsboolean trueInclude parent categories
Configuration Examples
Basic Classification
High-Confidence Only
Hierarchical with Ancestors
Multiple Classifications
{
"stage_type" : "apply" ,
"stage_id" : "taxonomy_enrich" ,
"parameters" : {
"taxonomy_id" : "product_categories" ,
"content_field" : "content" ,
"result_field" : "categories"
}
}
Output Schema
Basic Classification
{
"document_id" : "doc_123" ,
"content" : "Latest smartphone with 5G connectivity..." ,
"categories" : {
"primary" : {
"id" : "electronics.mobile.smartphones" ,
"name" : "Smartphones" ,
"confidence" : 0.95
},
"all" : [
{ "id" : "electronics.mobile.smartphones" , "name" : "Smartphones" , "confidence" : 0.95 },
{ "id" : "electronics.mobile" , "name" : "Mobile Devices" , "confidence" : 0.82 },
{ "id" : "electronics" , "name" : "Electronics" , "confidence" : 0.78 }
]
}
}
With Ancestors
{
"document_id" : "doc_456" ,
"content" : "Investment banking services..." ,
"industry" : {
"primary" : {
"id" : "finance.banking.investment" ,
"name" : "Investment Banking" ,
"confidence" : 0.91
},
"ancestors" : [
{ "id" : "finance.banking" , "name" : "Banking" , "level" : 2 },
{ "id" : "finance" , "name" : "Finance" , "level" : 1 }
],
"path" : "Finance > Banking > Investment Banking"
}
}
Low Confidence (No Match)
{
"document_id" : "doc_789" ,
"content" : "Random unrelated content..." ,
"categories" : {
"primary" : null ,
"all" : [],
"message" : "No classifications above confidence threshold"
}
}
Taxonomy Structure
Taxonomies are hierarchical classification systems:
Electronics
├── Mobile Devices
│ ├── Smartphones
│ ├── Tablets
│ └── Wearables
├── Computers
│ ├── Laptops
│ ├── Desktops
│ └── Components
└── Audio
├── Headphones
└── Speakers
Each node has:
ID : Dot-notation path (e.g., electronics.mobile.smartphones)
Name : Human-readable label
Level : Depth in hierarchy (1 = root)
Metric Value Latency 10-50ms per document Batch processing Automatic Model type Embedding-based classification Parallel execution Up to 20 concurrent
Pre-compute taxonomy embeddings for faster classification. Use top_k: 1 and higher min_confidence when you only need the best match.
Common Pipeline Patterns
Search + Classify + Filter
[
{
"stage_type" : "filter" ,
"stage_id" : "semantic_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 100
}
},
{
"stage_type" : "apply" ,
"stage_id" : "taxonomy_enrich" ,
"parameters" : {
"taxonomy_id" : "product_categories" ,
"result_field" : "category" ,
"top_k" : 1 ,
"min_confidence" : 0.7
}
},
{
"stage_type" : "filter" ,
"stage_id" : "structured_filter" ,
"parameters" : {
"conditions" : {
"field" : "category.primary.id" ,
"operator" : "starts_with" ,
"value" : "{{INPUT.category_filter}}"
}
}
}
]
Multi-Taxonomy Classification
[
{
"stage_type" : "filter" ,
"stage_id" : "semantic_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 50
}
},
{
"stage_type" : "apply" ,
"stage_id" : "taxonomy_enrich" ,
"parameters" : {
"taxonomy_id" : "topics" ,
"result_field" : "topic"
}
},
{
"stage_type" : "apply" ,
"stage_id" : "taxonomy_enrich" ,
"parameters" : {
"taxonomy_id" : "sentiment" ,
"result_field" : "sentiment"
}
}
]
Faceted Search Results
[
{
"stage_type" : "filter" ,
"stage_id" : "hybrid_search" ,
"parameters" : {
"query" : "{{INPUT.query}}" ,
"vector_index" : "text_extractor_v1_embedding" ,
"top_k" : 100
}
},
{
"stage_type" : "apply" ,
"stage_id" : "taxonomy_enrich" ,
"parameters" : {
"taxonomy_id" : "categories" ,
"result_field" : "facets" ,
"top_k" : 3 ,
"include_ancestors" : true
}
}
]
Error Handling
Error Behavior Unknown taxonomy_id Stage fails No match found Empty classification, continues Invalid content_field Stage fails Low confidence Filtered by min_confidence