Features
Understanding features and feature extraction in Mixpeek
Features are the extracted data elements that represent the content and characteristics of your documents. They are the building blocks that enable advanced search and retrieval capabilities.
Overview
Features in Mixpeek are structured data elements extracted from your content during processing. They represent specific aspects of your data such as:
- Text embeddings
- Image descriptors
- Audio transcriptions
- Video scene information
- PDF content structures
- Detected entities and concepts
Feature Selection
Choose which specific characteristics need to be extracted based on your retrieval and analysis requirements.
Extraction Processing
Run specialized extractors that process the content to generate features using AI models and algorithms.
Feature Storage
Store processed features in optimized feature stores designed for rapid retrieval and similarity searching.
Feature Extractors
Extractors are pre-built pipelines that have defined input and output schemas. Each extractor runs in parallel within a collection (in a queue). These extractors have optional parameters that can be configured when the collection is defined.
Vector Embeddings
High-dimensional numerical representations of content that capture semantic meaning
Metadata Features
Structured data fields such as categories, timestamps, and attributes
Media-Specific Features
Specialized features like image scene classifications, video timestamps, or audio speaker identification
Relational Features
Features that establish connections between different content items
Extraction Process
Features are created through feature extractors as part of processing pipelines. This happens in several stages:
Raw Content Analysis
Content is analyzed based on its type (text, image, video, etc.)
Feature Extraction
Specialized extractors process the content to generate features
Feature Normalization
Features are normalized into consistent formats
Storage
Processed features are stored in optimized feature stores
Feature Storage
Features are stored in specialized feature stores optimized for efficient retrieval. Unlike traditional database columns, feature stores are designed to handle:
- High-dimensional vector data
- Efficient similarity searching
- Specialized indexes for multimodal content
- Rapid retrieval of specific feature types
Best Practices
Understand Feature Types
Different content requires different feature types. Text works well with embeddings, while images need visual descriptors.
Feature Composition
Combine multiple features within multiple retrieval stages for more accurate retrieval. Text + image features provide better results than either alone.
Regular Updates
As your content evolves, consider reprocessing to generate updated features with the latest extractors.
Next Steps
Now that you understand features in Mixpeek, you can:
Limitations
- Extraction Time: Complex feature extraction on large media files may require extended processing time
- Model Specificity: Features are tied to the specific model version used during extraction
- Storage Limits: Feature stores have capacity limits based on your account tier
- Update Constraints: Features cannot be selectively updated; re-extraction of the entire document is required
- Processing Dependencies: Feature extraction depends on the availability of third-party models and services
- Cross-Feature Compatibility: Not all feature types can be directly compared or combined in search operations
- Format Support: Some specialized formats may have limited feature extraction capabilities
Was this page helpful?