Concepts
Key concepts for using a Multimodal Data Warehouse
Understanding these concepts will help you utilize the Mixpeek Multimodal Warehouse offerings.
Mixpeek organizes data in a structured hierarchy designed for flexibility and performance of multimodal content processing and retrieval.
Mixpeek Term | Description | Data Warehouse Analogy |
---|---|---|
Namespace | Query boundaries that isolate environments | Database/Schema |
Bucket | Storage containers for raw objects and files | Raw Data Lake/Storage Layer |
Object | Collections of related input files | Raw Data Files/Source Documents |
Blob | Individual raw files within Objects | Binary Data/Single File |
Collection | Groups of processed documents with consistent schema | Table |
Document | Structured outputs from feature extractors | Row |
Feature Extractor | Specialized components that process inputs to extract specific features | ETL Process/Transformation |
Feature | Extracted data elements stored in feature stores | Column/Field |
Feature Store | Specialized storage for extracted features optimized for efficient retrieval | Indexed Columns/Materialized Views |
Retriever | Query engines that search feature stores to find relevant documents | SQL Query Engine |
Retriever Stage | Components of search pipelines that perform specific operations in the retrieval process | Query Execution Plan Step |
Taxonomy | Multimodal equivalent of SQL JOIN operations | JOIN Operation |
Clustering | Multimodal equivalent of SQL GROUP BY operations | GROUP BY Operation |
Research | Multi-step process that explores topics through iterative searches, generates structured reports with sections, and combines retrieved information into cohesive content | Business Intelligence Report |
Component Relationships
The different components in Mixpeek relate to each other in specific ways:
Understanding the Relationships
Processing Components
Feature Extractors
Specialized components that process inputs to extract specific features like embeddings, detected objects, or transcriptions
Retrievers
Query engines that search feature stores to find relevant documents
Multimodal Analogs to SQL Operations
Mixpeek provides specialized components that function as multimodal analogs to traditional SQL operations:
Taxonomies
Taxonomies in Mixpeek serve as the multimodal equivalent of SQL JOIN operations. They allow you to enrich documents with metadata from other collections based on feature similarity rather than exact key matches.
Data Flow Architecture
Storage Layer (Buckets)
Raw objects and their associated files are stored in buckets. Objects represent collections of related files (e.g., a marketing campaign with video, script, and legal documents).
Processing Flow (Feature Extrctors)
Objects from buckets are processed through feature extractors. Feature extractors extract various features from the object’s files, which are then organized into documents stored in collections.
Feature Storage
Extracted features are stored in specialized feature stores. Each feature maintains a reference to its parent document, and each document maintains a reference to its source object.
Retrieval Flow (Retrieval Pipelines)
Queries are processed through retrieval pipelines that search feature stores to find relevant features. Features are used to locate their parent documents in collections.
Metadata and Document Properties
All documents in Mixpeek collections include standard metadata properties:
Fields prefixed with double underscores (__
) are reserved for system metadata. Do not use this prefix for your custom fields.
Next Steps
Now that you understand the core concepts of Mixpeek, you’re ready to start building with the platform:
Was this page helpful?