Understanding these concepts will help you utilize the Mixpeek Multimodal Warehouse offerings.
Mixpeek Term | Description | Data Warehouse Analogy |
---|---|---|
Namespace | Query boundaries that isolate environments | Database/Schema |
Bucket | Storage containers for raw objects and files | Raw Data Lake/Storage Layer |
Object | Collections of related input files | Raw Data Files/Source Documents |
Blob | Individual raw files within Objects | Binary Data/Single File |
Collection | Groups of processed documents with consistent schema | Table |
Document | Structured outputs from feature extractors | Row |
Feature Extractor | Specialized components that process inputs to extract specific features | ETL Process/Transformation |
Feature | Extracted data elements stored in feature stores | Column/Field |
Feature Store | Specialized storage for extracted features optimized for efficient retrieval | Indexed Columns/Materialized Views |
Retriever | Query engines that search feature stores to find relevant documents | SQL Query Engine |
Retriever Stage | Components of search pipelines that perform specific operations in the retrieval process | Query Execution Plan Step |
Taxonomy | Multimodal equivalent of SQL JOIN operations | JOIN Operation |
Clustering | Multimodal equivalent of SQL GROUP BY operations | GROUP BY Operation |
Research | Multi-step process that explores topics through iterative searches, generates structured reports with sections, and combines retrieved information into cohesive content | Business Intelligence Report |
Component Relationships
The different components in Mixpeek relate to each other in specific ways:Understanding the Relationships
- Buckets contain Objects; Objects are grouped into Batches for processing.
- Submitting a Batch runs Feature Extractors to produce Documents and Feature Stores.
- Documents (features + metadata) live in Collections.
- Taxonomies enrich Documents; Clusters group Documents.
- Retrievers query Feature Stores and return ranked Documents.
Processing Components
Feature Extractors
Specialized components that process inputs to extract features (embeddings, faces, scenes, objects, transcripts). They populate feature stores and produce documents in collections.
Retrievers
Configurable pipelines that combine KNN over feature stores, metadata filters, grouping, and reranking to return relevant documents.
SQL Analogs
- JOIN → Taxonomies: similarity joins that enrich documents (flat or hierarchical).
- GROUP BY → Clustering: groups similar documents; can output centroids/artifacts.