Concepts

Understanding these concepts will help you utilize the Mixpeek Multimodal Warehouse offerings.

Mixpeek organizes data in a structured hierarchy designed for flexibility and performance of multimodal content processing and retrieval.

Mixpeek Term	Description	Data Warehouse Analogy
Namespace	Query boundaries that isolate environments	Database/Schema
Bucket	Storage containers for raw objects and files	Raw Data Lake/Storage Layer
Object	Collections of related input files	Raw Data Files/Source Documents
Blob	Individual raw files within Objects	Binary Data/Single File
Collection	Groups of processed documents with consistent schema	Table
Document	Structured outputs from feature extractors	Row
Feature Extractor	Specialized components that process inputs to extract specific features	ETL Process/Transformation
Feature	Extracted data elements stored in feature stores	Column/Field
Feature Store	Specialized storage for extracted features optimized for efficient retrieval	Indexed Columns/Materialized Views
Retriever	Query engines that search feature stores to find relevant documents	SQL Query Engine
Retriever Stage	Components of search pipelines that perform specific operations in the retrieval process	Query Execution Plan Step
Taxonomy	Multimodal equivalent of SQL JOIN operations	JOIN Operation
Clustering	Multimodal equivalent of SQL GROUP BY operations	GROUP BY Operation
Research	Multi-step process that explores topics through iterative searches, generates structured reports with sections, and combines retrieved information into cohesive content	Business Intelligence Report

Component Relationships

The different components in Mixpeek relate to each other in specific ways:

Understanding the Relationships

Bucket → Object Relationship

Object → Document Relationship

Document → Feature Relationship

Collection → Document Relationship

Processing Components

Feature Extractors

Specialized components that process inputs to extract specific features like embeddings, detected objects, or transcriptions

Retrievers

Query engines that search feature stores to find relevant documents

Multimodal Analogs to SQL Operations

Mixpeek provides specialized components that function as multimodal analogs to traditional SQL operations:

Taxonomies

Taxonomies in Mixpeek serve as the multimodal equivalent of SQL JOIN operations. They allow you to enrich documents with metadata from other collections based on feature similarity rather than exact key matches.

Flat Taxonomies

Hierarchical Taxonomies

Data Flow Architecture

Storage Layer (Buckets)

Raw objects and their associated files are stored in buckets. Objects represent collections of related files (e.g., a marketing campaign with video, script, and legal documents).

Processing Flow (Feature Extrctors)

Objects from buckets are processed through feature extractors. Feature extractors extract various features from the object’s files, which are then organized into documents stored in collections.

Feature Storage

Extracted features are stored in specialized feature stores. Each feature maintains a reference to its parent document, and each document maintains a reference to its source object.

Retrieval Flow (Retrieval Pipelines)

Queries are processed through retrieval pipelines that search feature stores to find relevant features. Features are used to locate their parent documents in collections.

Metadata and Document Properties

All documents in Mixpeek collections include standard metadata properties:

{
  "__fully_enriched": true,           // Indicates if all expected features have been extracted
  "__missing_features": [],           // Lists any features that failed to extract
  "__pipeline_version": 1,            // Version of the pipeline that processed this document
  "source_object_id": "obj_123abc"    // Reference to the source object in a bucket
  // Additional document-specific fields...
}

Fields prefixed with double underscores (__) are reserved for system metadata. Do not use this prefix for your custom fields.

Next Steps

Now that you understand the core concepts of Mixpeek, you’re ready to start building with the platform:

Quickstart Guide

Get started with your first Mixpeek project

Data Management

Learn how to organize and manage your data

Overview

Data Management

Data Processing

Search & Retrieval

Data Enrichment

Troubleshooting

Component Relationships

Understanding the Relationships

Processing Components

Feature Extractors

Retrievers

Multimodal Analogs to SQL Operations

Taxonomies

Data Flow Architecture

Metadata and Document Properties

Next Steps

Quickstart Guide

Data Management

Overview

Data Management

Data Processing

Search & Retrieval

Data Enrichment

Troubleshooting

​Component Relationships

​Understanding the Relationships

​Processing Components

Feature Extractors

Retrievers

​Multimodal Analogs to SQL Operations

Taxonomies

​Data Flow Architecture

​Metadata and Document Properties

​Next Steps

Quickstart Guide

Data Management

Component Relationships

Understanding the Relationships

Processing Components

Multimodal Analogs to SQL Operations

Data Flow Architecture

Metadata and Document Properties

Next Steps