Clusters
Discover, organize, and search multimodal features using automatic and manual clustering
Clustering is only available for enterprise customers, email info@mixpeek.com for a demo.
Clusters in Mixpeek are groups of similar features automatically discovered or manually defined. They enable efficient organization, search, and analysis of your multimodal features. Clusters are best used for:
- Discovering natural content groupings
- Pattern detection
- Automated organization
- When you want to let the content organize itself
How Clustering Works
Feature Extraction
Assets are processed into features representing:
- Visual content
- Objects
- Spoken Words
- Metadata
- etc.
Similarity Calculation
Features are compared using:
- Vector similarity
- Semantic relationships
- Temporal proximity
Cluster Formation
Similar features are grouped based on:
- Distance thresholds
- Density patterns
- User-defined rules
Use Cases
Content Organization
Automatically organize video libraries by:
- Content type
- Visual similarity
- Semantic themes
Pattern Discovery
Uncover hidden patterns in your content:
- Common scenes
- Recurring themes
- Related sequences
Search Enhancement
Improve search efficiency through:
- Cluster-based filtering
- Contextual recommendations
- Similar content discovery
Quality Control
Monitor and maintain content quality by:
- Identifying outliers
- Detecting anomalies
- Validating content consistency
Implementation
Internal Cluster Structure
Features store cluster assignments in a simplified array structure:
Searching with Clusters
Best Practices for Video Clustering
Preprocessing
- Extract features at appropriate intervals (10-15 seconds recommended)
- Use the scene detection parameter to identify natural segment boundaries
- Consider both visual and audio features for complete context
- Normalize video resolution and quality for consistent processing
Model Selection
- Use multimodal embeddings for combined visual-semantic understanding
- Consider any of our specialized models for specific content types (sports, ads, etc.)
- Balance model complexity with processing requirements
- Test different embedding combinations for optimal results
Cluster Configuration
- Start with conservative clustering parameters (higher min_cluster_size)
- Adjust confidence thresholds based on content similarity requirements
- Use appropriate sample sizes for initial cluster discovery
- Enable automatic naming for better cluster interpretability
Performance Optimization
- Batch process similar video content together
- Cache frequently accessed cluster assignments
- Use appropriate indexing strategies for faster lookups
- Monitor and adjust resource utilization
Video clustering can be resource-intensive. Consider these limitations:
- Maximum video duration: 4 hours
- Maximum file size: 2GB
- Processing timeout: 30 minutes
- Rate limits apply to clustering requests
For optimal results, combine clustering with taxonomies when dealing with domain-specific video content. This provides both automated discovery and structured organization.
Was this page helpful?