Clusters in Mixpeek are groups of similar features automatically discovered or manually defined. They enable efficient organization, search, and analysis of your multimodal features.

How Clustering Works

1

Feature Extraction

Assets are processed into features representing:

  • Visual content
  • Objects
  • Spoken Words
  • Metadata
  • etc.
2

Similarity Calculation

Features are compared using:

  • Vector similarity
  • Semantic relationships
  • Temporal proximity
3

Cluster Formation

Similar features are grouped based on:

  • Distance thresholds
  • Density patterns
  • User-defined rules

Use Cases

Content Organization

Automatically organize video libraries by:

  • Content type
  • Visual similarity
  • Semantic themes

Pattern Discovery

Uncover hidden patterns in your content:

  • Common scenes
  • Recurring themes
  • Related sequences

Search Enhancement

Improve search efficiency through:

  • Cluster-based filtering
  • Contextual recommendations
  • Similar content discovery

Quality Control

Monitor and maintain content quality by:

  • Identifying outliers
  • Detecting anomalies
  • Validating content consistency

Implementation

# Discover clusters automatically
POST /entities/cluster/discover
{
  "collection_name": "video_features",
  "method": "dbscan",
  "settings": {
    "sample_size": 1000,
    "neighbor_limit": 20
  }
}

Architecture

Performance Considerations

Clustering large feature sets can be computationally intensive. Mixpeek uses Qdrant’s optimized distance matrix calculations and supports sample-based clustering for better performance.

When using manual clusters, ensure your taxonomy terms are specific enough to avoid overlapping clusters that could impact search precision.

Advanced Features

Hierarchical Organization

Features can belong to multiple clusters and cluster hierarchies:

{
  "clusters": [
    {
      "cluster_id": "clu_123",
      "path": "sports/tennis/serve",
      "confidence": 0.95
    },
    {
      "cluster_id": "clu_123",
      "path": "training/technique",
      "confidence": 0.88
    }
  ]
}

Use cluster hierarchies to create intuitive navigation structures for your content.

Best Practices

1

Preparation

  • Clean and normalize your feature data
  • Choose appropriate clustering parameters
  • Define clear taxonomy rules
2

Implementation

  • Start with sample-based clustering
  • Validate cluster quality
  • Monitor cluster distributions
3

Optimization

  • Adjust parameters based on results
  • Refine taxonomy terms
  • Balance cluster sizes

Updates and Maintenance

Keep your clusters up to date by periodically running the clustering process on new content. Mixpeek handles incremental updates efficiently.