Clusters
Discover, organize, and search multimodal features using automatic and manual clustering
Clustering is only available for enterprise customers, email info@mixpeek.com for a demo.
Clusters in Mixpeek are groups of similar features automatically discovered or manually defined. They enable efficient organization, search, and analysis of your multimodal features.
How Clustering Works
Feature Extraction
Assets are processed into features representing:
- Visual content
- Objects
- Spoken Words
- Metadata
- etc.
Similarity Calculation
Features are compared using:
- Vector similarity
- Semantic relationships
- Temporal proximity
Cluster Formation
Similar features are grouped based on:
- Distance thresholds
- Density patterns
- User-defined rules
Use Cases
Content Organization
Automatically organize video libraries by:
- Content type
- Visual similarity
- Semantic themes
Pattern Discovery
Uncover hidden patterns in your content:
- Common scenes
- Recurring themes
- Related sequences
Search Enhancement
Improve search efficiency through:
- Cluster-based filtering
- Contextual recommendations
- Similar content discovery
Quality Control
Monitor and maintain content quality by:
- Identifying outliers
- Detecting anomalies
- Validating content consistency
Implementation
Internal Cluster Structure
Features store cluster assignments in a simplified array structure:
Searching with Clusters
Best Practices for Video Clustering
Preprocessing
- Extract features at appropriate intervals (10-15 seconds recommended)
- Use the scene detection parameter to identify natural segment boundaries
- Consider both visual and audio features for complete context
- Normalize video resolution and quality for consistent processing
Model Selection
- Use multimodal embeddings for combined visual-semantic understanding
- Consider any of our specialized models for specific content types (sports, ads, etc.)
- Balance model complexity with processing requirements
- Test different embedding combinations for optimal results
Cluster Configuration
- Start with conservative clustering parameters (higher min_cluster_size)
- Adjust confidence thresholds based on content similarity requirements
- Use appropriate sample sizes for initial cluster discovery
- Enable automatic naming for better cluster interpretability
Performance Optimization
- Batch process similar video content together
- Cache frequently accessed cluster assignments
- Use appropriate indexing strategies for faster lookups
- Monitor and adjust resource utilization
Video clustering can be resource-intensive. Consider these limitations:
- Maximum video duration: 4 hours
- Maximum file size: 2GB
- Processing timeout: 30 minutes
- Rate limits apply to clustering requests
For optimal results, combine clustering with taxonomies when dealing with domain-specific video content. This provides both automated discovery and structured organization.
Was this page helpful?