Taxonomies in Mixpeek provide a structured way to organize and classify multimodal content using customizable hierarchical categories. Each node can use multiple embedding models to capture both visual and semantic aspects of content.

Sample Taxonomy Structure

Implementation

POST /entities/taxonomies
{
  "taxonomy_name": "Sports Training Library",
  "description": "Hierarchical classification of sports training videos",
  "nodes": [
    {
      "name": "Tennis Training",
      "description": "Tennis instruction and training videos",
      "embedding_config": [
        {
          "embedding_model": "vertex-multimodal",
          "type": "video",
          "value": "https://video.mp4"
        },
        {
          "embedding_model": "baai-bge-m3",
          "type": "text",
          "value": "Professional tennis training, serve techniques, court positioning"
        }
      ],
      "children": [
        {
          "name": "Serve Technique",
          "parent_node_name": "Tennis Training",
          "embedding_config": [
            {
              "embedding_model": "vertex-multimodal",
              "type": "video",
              "value": "https://serving.mp4"
            },
            {
              "embedding_model": "baai-bge-m3",
              "type": "text",
              "value": "Tennis serve mechanics, grip techniques, ball toss training"
            }
          ]
        },
        {
          "name": "Return Practice",
          "parent_node_name": "Tennis Training",
          "embedding_config": [
            {
              "embedding_model": "vertex-multimodal",
              "type": "video",
              "value": "https://returning.mp4"
            },
            {
              "embedding_model": "baai-bge-m3",
              "type": "text",
              "value": "Tennis return drills, footwork, anticipation training"
            }
          ],
          "children": [
            {
              "name": "Backhand Returns",
              "parent_node_name": "Return Practice",
              "embedding_config": [
                {
                  "embedding_model": "vertex-multimodal",
                  "type": "video",
                  "value": "https://backhand.mp4"
                },
                {
                  "embedding_model": "baai-bge-m3",
                  "type": "text",
                  "value": "Two-handed backhand return technique and drills"
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Advanced Implementation

# Create development namespace
POST /namespaces
{
  "namespace_name": "sports_training_dev",
  "embedding_models": ["text", "video", "multimodal"]
}

# Create collections
POST /collections # with X-Namespace: sports_training_dev
{
  "collection_name": "training-videos-sample"
}

Searching with Taxonomies

POST /features/search
{
  "collections": ["training-videos"],
  "filters": {
    "AND": [
      {
        "key": "entities.taxonomy_classifications.node_ids",
        "operator": "in",
        "value": ["tax_6cf982d452"]  # Tennis Training videos
      }
    ]
  }
}

Automatic Classification During Ingestion

POST /ingest/videos/url
{
  "url": "https://example.com/tennis-lesson.mp4",
  "collection": "training-videos",
  "feature_extractors": {
    "video": [{
      "embed": [
        {
          "type": "url",
          "embedding_model": "multimodal"
        }
      ],
      "entities": {
        "taxonomy_ids": ["tax_6cf982d452"],
        "confidence_threshold": 0.8
      }
    }]
  }
}

Best Practices for Video Taxonomies

1

Video Description

  • Describe key visual elements
  • Include relevant actions and movements
  • Specify important technical details
2

Multimodal Configuration

  • Use video embeddings for visual content
  • Add text embeddings for semantic context
  • Combine multiple models for better accuracy
3

Hierarchy Design

  • Group similar techniques together
  • Create logical progression paths
  • Maintain consistent categorization

Video embedding processing can be resource-intensive. Consider using key frames or segments for initial classification.

The vertex-multimodal model can process both video frames and text descriptions, making it ideal for video content classification.

Performance Considerations