Taxonomies provide a structured way to organize and classify multimodal content using hierarchical categories. Each node can use multiple embedding models to capture both visual and semantic aspects of content. Taxonomies are best used for:

  • Predefined categories and hierarchies
  • Domain-specific organization
  • Manual, expert-driven classification
  • When you need strict control over content organization

Quick Start Video

Watch this 5-minute live walkthrough to see taxonomies in action. The video covers creating, managing, and implementing taxonomies for multimodal content classification.


Sample Multimodal Taxonomy

Implementation

POST /entities/taxonomies
{
  "taxonomy_name": "sports_training_library",
  "description": "Hierarchical classification of sports training videos",
  "nodes": [
    {
      "name": "tennis_training",
      "description": "Tennis instruction and training videos",
      "embedding_config": [
        {
          "embedding_model": "vertex-multimodal",
          "type": "video",
          "value": "https://video.mp4"
        },
        {
          "embedding_model": "baai-bge-m3",
          "type": "text",
          "value": "Professional tennis training, serve techniques, court positioning"
        }
      ],
      "children": [
        {
          "name": "serve_technique",
          "parent_node_name": "tennis_training",
          "embedding_config": [
            {
              "embedding_model": "vertex-multimodal",
              "type": "video",
              "value": "https://serving.mp4"
            },
            {
              "embedding_model": "baai-bge-m3",
              "type": "text",
              "value": "Tennis serve mechanics, grip techniques, ball toss training"
            }
          ]
        },
        {
          "name": "return_practice",
          "parent_node_name": "tennis_training",
          "embedding_config": [
            {
              "embedding_model": "vertex-multimodal",
              "type": "video",
              "value": "https://returning.mp4"
            },
            {
              "embedding_model": "baai-bge-m3",
              "type": "text",
              "value": "Tennis return drills, footwork, anticipation training"
            }
          ],
          "children": [
            {
              "name": "backhand_returns",
              "parent_node_name": "return_practice",
              "embedding_config": [
                {
                  "embedding_model": "vertex-multimodal",
                  "type": "video",
                  "value": "https://backhand.mp4"
                },
                {
                  "embedding_model": "baai-bge-m3",
                  "type": "text",
                  "value": "Two-handed backhand return technique and drills"
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Advanced Implementation

# Create development namespace
POST /namespaces
{
  "namespace_name": "sports_training_dev",
  "embedding_models": ["text", "video", "multimodal"]
}

# Create collections
POST /collections # with X-Namespace: sports_training_dev
{
  "collection_name": "training-videos-sample"
}

Searching with Taxonomies

POST /features/search
{
    "collections": ["training-videos-sample"],
    "filters": {
        "AND": [
            {
                "key": "taxonomy_nodes[].node",
                "operator": "in",
                "value": ["serve_technique"]  # Can use node name
            }
        ]
    }
}

When filtering by taxonomy_nodes[].node, you can use either the node’s name (e.g., “serve_technique”) or its ID (e.g., “node_abc123”). The system will automatically match either format.

Internal Taxonomy Structure

Features store taxonomy classifications with hierarchical information in a simplified array structure:

{
  "taxonomy_nodes": [
    {
      "node_id": "node_123",
      "depth": 0,
      "score": 0.82,
      "order": [0]
    },
    {
      "node_id": "node_3212",
      "depth": 1,
      "score": 0.91,
      "order": [0, 1]
    }
  ]
}

Each classification contains:

  • node_id: Unique node identifier
  • depth: Level in the taxonomy tree (0 = root)
  • score: Classification confidence score
  • order: Array representing path in tree

This structure enables several filtering patterns:

# Find only root-level classifications
POST /features/search
{
  "filters": {
    "AND": [
      {
        "key": "taxonomy_nodes[].depth",
        "operator": "eq",
        "value": 0
      }
    ]
  }
}

Automatic Classification During Ingestion

POST /ingest/videos/url
{
  "url": "https://example.com/tennis-lesson.mp4",
  "collection": "training-videos",
  "feature_extractors": {
    "video": [{
      "embed": [
        {
          "type": "url",
          "embedding_model": "multimodal"
        }
      ],
      "entities": {
        "taxonomies": ["sports_training_library"],
        "confidence_threshold": 0.8
      }
    }]
  }
}

Best Practices for Video Taxonomies

1

Video Description

  • Describe key visual elements
  • Include relevant actions and movements
  • Specify important technical details
2

Multimodal Configuration

  • Use video embeddings for visual content
  • Add text embeddings for semantic context
  • Combine multiple models for better accuracy
3

Hierarchy Design

  • Group similar techniques together
  • Create logical progression paths
  • Maintain consistent categorization

Video embedding processing can be resource-intensive. Consider using key frames or segments for initial classification.

The vertex-multimodal model can process both video frames and text descriptions, making it ideal for video content classification.

Performance Considerations

Feature Storage Format

Features store taxonomy classifications in the taxonomy_nodes array. Each classification includes:

{
    "taxonomy_nodes": [
        {
            "taxonomy_id": "tax_38c33e0d8f",     // Unique identifier for the taxonomy
            "node_id": "node_9761ac566c6e11",    // Unique identifier for the node
            "score": 0.5,                        // Classification confidence (0-1)
            "depth": 0,                          // Level in hierarchy (0 = root)
            "order": [0]                         // Path representation in the tree
        }
    ]
}

Multiple classifications can be stored for a single feature, allowing content to be categorized under multiple nodes or taxonomies.