Skip to main content
Taxonomy Enrich stage showing document classification with hierarchical taxonomies
The Taxonomy Enrich stage classifies documents against predefined taxonomies, adding structured category labels and hierarchical classifications to your search results.
Stage Category: APPLY (Enriches documents with classifications)Transformation: N documents → N documents (with taxonomy labels added)

When to Use

Use CaseDescription
Content categorizationAuto-classify documents into topics
Faceted searchAdd filterable category facets
Compliance taggingApply regulatory classifications
Product taxonomyClassify into product hierarchies

When NOT to Use

ScenarioRecommended Alternative
Free-form taggingllm_enrichment
Pre-classified contentSkip this stage
Custom classification logicapi_call to custom service

Parameters

ParameterTypeDefaultDescription
taxonomy_idstringRequiredID of the taxonomy to use
content_fieldstringcontentField to classify
result_fieldstringtaxonomyField for classification results
max_depthintegernullMaximum hierarchy depth
top_kinteger3Number of top classifications
min_confidencefloat0.5Minimum confidence threshold
include_ancestorsbooleantrueInclude parent categories

Configuration Examples

{
  "stage_type": "apply",
  "stage_id": "taxonomy_enrich",
  "parameters": {
    "taxonomy_id": "product_categories",
    "content_field": "content",
    "result_field": "categories"
  }
}

Output Schema

Basic Classification

{
  "document_id": "doc_123",
  "content": "Latest smartphone with 5G connectivity...",
  "categories": {
    "primary": {
      "id": "electronics.mobile.smartphones",
      "name": "Smartphones",
      "confidence": 0.95
    },
    "all": [
      {"id": "electronics.mobile.smartphones", "name": "Smartphones", "confidence": 0.95},
      {"id": "electronics.mobile", "name": "Mobile Devices", "confidence": 0.82},
      {"id": "electronics", "name": "Electronics", "confidence": 0.78}
    ]
  }
}

With Ancestors

{
  "document_id": "doc_456",
  "content": "Investment banking services...",
  "industry": {
    "primary": {
      "id": "finance.banking.investment",
      "name": "Investment Banking",
      "confidence": 0.91
    },
    "ancestors": [
      {"id": "finance.banking", "name": "Banking", "level": 2},
      {"id": "finance", "name": "Finance", "level": 1}
    ],
    "path": "Finance > Banking > Investment Banking"
  }
}

Low Confidence (No Match)

{
  "document_id": "doc_789",
  "content": "Random unrelated content...",
  "categories": {
    "primary": null,
    "all": [],
    "message": "No classifications above confidence threshold"
  }
}

Taxonomy Structure

Taxonomies are hierarchical classification systems:
Electronics
├── Mobile Devices
│   ├── Smartphones
│   ├── Tablets
│   └── Wearables
├── Computers
│   ├── Laptops
│   ├── Desktops
│   └── Components
└── Audio
    ├── Headphones
    └── Speakers
Each node has:
  • ID: Dot-notation path (e.g., electronics.mobile.smartphones)
  • Name: Human-readable label
  • Level: Depth in hierarchy (1 = root)

Performance

MetricValue
Latency10-50ms per document
Batch processingAutomatic
Model typeEmbedding-based classification
Parallel executionUp to 20 concurrent
Pre-compute taxonomy embeddings for faster classification. Use top_k: 1 and higher min_confidence when you only need the best match.

Common Pipeline Patterns

Search + Classify + Filter

[
  {
    "stage_type": "filter",
    "stage_id": "semantic_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 100
    }
  },
  {
    "stage_type": "apply",
    "stage_id": "taxonomy_enrich",
    "parameters": {
      "taxonomy_id": "product_categories",
      "result_field": "category",
      "top_k": 1,
      "min_confidence": 0.7
    }
  },
  {
    "stage_type": "filter",
    "stage_id": "structured_filter",
    "parameters": {
      "conditions": {
        "field": "category.primary.id",
        "operator": "starts_with",
        "value": "{{INPUT.category_filter}}"
      }
    }
  }
]

Multi-Taxonomy Classification

[
  {
    "stage_type": "filter",
    "stage_id": "semantic_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 50
    }
  },
  {
    "stage_type": "apply",
    "stage_id": "taxonomy_enrich",
    "parameters": {
      "taxonomy_id": "topics",
      "result_field": "topic"
    }
  },
  {
    "stage_type": "apply",
    "stage_id": "taxonomy_enrich",
    "parameters": {
      "taxonomy_id": "sentiment",
      "result_field": "sentiment"
    }
  }
]

Faceted Search Results

[
  {
    "stage_type": "filter",
    "stage_id": "hybrid_search",
    "parameters": {
      "query": "{{INPUT.query}}",
      "vector_index": "text_extractor_v1_embedding",
      "top_k": 100
    }
  },
  {
    "stage_type": "apply",
    "stage_id": "taxonomy_enrich",
    "parameters": {
      "taxonomy_id": "categories",
      "result_field": "facets",
      "top_k": 3,
      "include_ancestors": true
    }
  }
]

Error Handling

ErrorBehavior
Unknown taxonomy_idStage fails
No match foundEmpty classification, continues
Invalid content_fieldStage fails
Low confidenceFiltered by min_confidence