Features are the extracted data elements that represent the content and characteristics of your documents. They are the building blocks that enable advanced search and retrieval capabilities.

Overview

Features in Mixpeek are structured data elements extracted from your content during processing. They represent specific aspects of your data such as:

  • Text embeddings
  • Image descriptors
  • Audio transcriptions
  • Video scene information
  • PDF content structures
  • Detected entities and concepts
1

Feature Selection

Choose which specific characteristics need to be extracted based on your retrieval and analysis requirements.

2

Extraction Processing

Run specialized extractors that process the content to generate features using AI models and algorithms.

3

Feature Storage

Store processed features in optimized feature stores designed for rapid retrieval and similarity searching.

Feature Extractors

Extractors are pre-built pipelines that have defined input and output schemas. Each extractor runs in parallel within a collection (in a queue). These extractors have optional parameters that can be configured when the collection is defined.

Vector Embeddings

High-dimensional numerical representations of content that capture semantic meaning

Metadata Features

Structured data fields such as categories, timestamps, and attributes

Media-Specific Features

Specialized features like image scene classifications, video timestamps, or audio speaker identification

Relational Features

Features that establish connections between different content items

Extraction Process

Features are created through feature extractors as part of processing pipelines. This happens in several stages:

1

Raw Content Analysis

Content is analyzed based on its type (text, image, video, etc.)

2

Feature Extraction

Specialized extractors process the content to generate features

3

Feature Normalization

Features are normalized into consistent formats

4

Storage

Processed features are stored in optimized feature stores

Feature Storage

Features are stored in specialized feature stores optimized for efficient retrieval. Unlike traditional database columns, feature stores are designed to handle:

  • High-dimensional vector data
  • Efficient similarity searching
  • Specialized indexes for multimodal content
  • Rapid retrieval of specific feature types

Best Practices

Understand Feature Types

Different content requires different feature types. Text works well with embeddings, while images need visual descriptors.

Feature Composition

Combine multiple features within multiple retrieval stages for more accurate retrieval. Text + image features provide better results than either alone.

Regular Updates

As your content evolves, consider reprocessing to generate updated features with the latest extractors.

Next Steps

Now that you understand features in Mixpeek, you can:

Limitations

  • Extraction Time: Complex feature extraction on large media files may require extended processing time
  • Model Specificity: Features are tied to the specific model version used during extraction
  • Storage Limits: Feature stores have capacity limits based on your account tier
  • Update Constraints: Features cannot be selectively updated; re-extraction of the entire document is required
  • Processing Dependencies: Feature extraction depends on the availability of third-party models and services
  • Cross-Feature Compatibility: Not all feature types can be directly compared or combined in search operations
  • Format Support: Some specialized formats may have limited feature extraction capabilities