Key concepts for using a Multimodal Data Warehouse
Mixpeek Term | Description | Data Warehouse Analogy |
---|---|---|
Namespace | Query boundaries that isolate environments | Database/Schema |
Bucket | Storage containers for raw objects and files | Raw Data Lake/Storage Layer |
Object | Collections of related input files | Raw Data Files/Source Documents |
Blob | Individual raw files within Objects | Binary Data/Single File |
Collection | Groups of processed documents with consistent schema | Table |
Document | Structured outputs from feature extractors | Row |
Feature Extractor | Specialized components that process inputs to extract specific features | ETL Process/Transformation |
Feature | Extracted data elements stored in feature stores | Column/Field |
Feature Store | Specialized storage for extracted features optimized for efficient retrieval | Indexed Columns/Materialized Views |
Retriever | Query engines that search feature stores to find relevant documents | SQL Query Engine |
Retriever Stage | Components of search pipelines that perform specific operations in the retrieval process | Query Execution Plan Step |
Taxonomy | Multimodal equivalent of SQL JOIN operations | JOIN Operation |
Clustering | Multimodal equivalent of SQL GROUP BY operations | GROUP BY Operation |
Research | Multi-step process that explores topics through iterative searches, generates structured reports with sections, and combines retrieved information into cohesive content | Business Intelligence Report |
Bucket → Object Relationship
Object → Document Relationship
Document → Feature Relationship
Collection → Document Relationship
Flat Taxonomies
Hierarchical Taxonomies
Storage Layer (Buckets)
Processing Flow (Feature Extrctors)
Feature Storage
Retrieval Flow (Retrieval Pipelines)
__
) are reserved for system metadata. Do not use this prefix for your custom fields.