Taxonomies in Mixpeek provide a structured way to organize and classify multimodal content using customizable hierarchical categories. Each node can use multiple embedding models to capture both visual and semantic aspects of content.

Quick Start Video

Watch this 5-minute live walkthrough to see taxonomies in action. The video covers creating, managing, and implementing taxonomies for multimodal content classification.


Sample Multimodal Taxonomy

Implementation

POST /entities/taxonomies
{
  "taxonomy_name": "sports_training_library",
  "description": "Hierarchical classification of sports training videos",
  "nodes": [
    {
      "name": "tennis_training",
      "description": "Tennis instruction and training videos",
      "embedding_config": [
        {
          "embedding_model": "vertex-multimodal",
          "type": "video",
          "value": "https://video.mp4"
        },
        {
          "embedding_model": "baai-bge-m3",
          "type": "text",
          "value": "Professional tennis training, serve techniques, court positioning"
        }
      ],
      "children": [
        {
          "name": "serve_technique",
          "parent_node_name": "tennis_training",
          "embedding_config": [
            {
              "embedding_model": "vertex-multimodal",
              "type": "video",
              "value": "https://serving.mp4"
            },
            {
              "embedding_model": "baai-bge-m3",
              "type": "text",
              "value": "Tennis serve mechanics, grip techniques, ball toss training"
            }
          ]
        },
        {
          "name": "return_practice",
          "parent_node_name": "tennis_training",
          "embedding_config": [
            {
              "embedding_model": "vertex-multimodal",
              "type": "video",
              "value": "https://returning.mp4"
            },
            {
              "embedding_model": "baai-bge-m3",
              "type": "text",
              "value": "Tennis return drills, footwork, anticipation training"
            }
          ],
          "children": [
            {
              "name": "backhand_returns",
              "parent_node_name": "return_practice",
              "embedding_config": [
                {
                  "embedding_model": "vertex-multimodal",
                  "type": "video",
                  "value": "https://backhand.mp4"
                },
                {
                  "embedding_model": "baai-bge-m3",
                  "type": "text",
                  "value": "Two-handed backhand return technique and drills"
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Advanced Implementation

# Create development namespace
POST /namespaces
{
  "namespace_name": "sports_training_dev",
  "embedding_models": ["text", "video", "multimodal"]
}

# Create collections
POST /collections # with X-Namespace: sports_training_dev
{
  "collection_name": "training-videos-sample"
}

Searching with Taxonomies

POST /features/search
{
    "collections": ["training-videos-sample"],
    "filters": {
        "AND": [
            {
                "key": "entities[].node",
                "operator": "in",
                "value": ["serve_technique"]  # Can use node name
            }
        ]
    }
}

When filtering by entities[].node, you can use either the node’s name (e.g., “serve_technique”) or its ID (e.g., “node_abc123”). The system will automatically match either format.

Internal Taxonomy Structure

Features store taxonomy classifications with hierarchical information in a simplified array structure:

{
  "entities": [
    {
      "node_id": "tax_tennis_training",
      "depth": 0,
      "score": 0.82,
      "order": [0]
    },
    {
      "node_id": "tax_return_practice",
      "depth": 1,
      "score": 0.91,
      "order": [0, 1]
    }
  ]
}

Each classification contains:

  • node_id: Unique node identifier
  • depth: Level in the taxonomy tree (0 = root)
  • score: Classification confidence score
  • order: Array representing path in tree

This structure enables several filtering patterns:

# Find only root-level classifications
POST /features/search
{
  "filters": {
    "AND": [
      {
        "key": "entities[].depth",
        "operator": "eq",
        "value": 0
      }
    ]
  }
}

Automatic Classification During Ingestion

POST /ingest/videos/url
{
  "url": "https://example.com/tennis-lesson.mp4",
  "collection": "training-videos",
  "feature_extractors": {
    "video": [{
      "embed": [
        {
          "type": "url",
          "embedding_model": "multimodal"
        }
      ],
      "entities": {
        "taxonomies": ["sports_training_library"],
        "confidence_threshold": 0.8
      }
    }]
  }
}

Best Practices for Video Taxonomies

1

Video Description

  • Describe key visual elements
  • Include relevant actions and movements
  • Specify important technical details
2

Multimodal Configuration

  • Use video embeddings for visual content
  • Add text embeddings for semantic context
  • Combine multiple models for better accuracy
3

Hierarchy Design

  • Group similar techniques together
  • Create logical progression paths
  • Maintain consistent categorization

Video embedding processing can be resource-intensive. Consider using key frames or segments for initial classification.

The vertex-multimodal model can process both video frames and text descriptions, making it ideal for video content classification.

Performance Considerations