Welcome to the Mixpeek documentation! We’re happy to empower you to build multimodal understanding applications.

Before we dive in, let’s quickly define what a Multimodal Warehouse is:

Multimodal Warehouse: A specialized data platform that ingests, processes, and enables retrieval across diverse media types (text, images, videos, audio, PDFs) by extracting features and storing them in optimized collections and feature stores.

It serves as the foundation for building AI-powered search and analysis applications that can work seamlessly across different content types.

Within Mixpeek, developers have access to pre-built feature extraction pipelines and retrieval stages that enables ad-hoc search and discovery across any file type.

Unified Multimodal Platform

  • Process and analyze any content type with a single platform
  • Example: Finding unique faces in videos that mention topics present in pdfs

Custom Processing Pipelines

  • Define exactly how your content is processed with customizable feature extraction
  • Example: Creating a pipeline that extracts speaker identity, sentiment, and product mentions from customer call recordings

Advanced Retrieval

  • Combine vector similarity with metadata filtering for precise results across modalities
  • Example: Searching for “marketing videos featuring outdoor scenes that align with our brand guidelines document”

Data Organization

  • Apply taxonomies and clustering to bring structure to unstructured content
  • Example: Automatically organizing product images into categories based on visual similarity and metadata from product descriptions

Always State-of-the-Art

Continuous Improvement

Mixpeek manages all feature extractors and retriever stages, continuously updating them to incorporate the latest advancements in AI and ML.

Tightly Coupled

Our extractors and retrievers are designed as an integrated system. This tight coupling enables advanced techniques like late interaction models (e.g., ColBERT).

Unlike traditional solutions that require rebuilding your entire index when upgrading to new models, Mixpeek performs upgrades seamlessly behind the scenes—keeping you on the cutting edge without disruption.

How It Works

1

Upload Objects

Upload and store your multimodal data in buckets, organizing related files as objects

2

Extract Features

Extract features using custom pipelines with specialized extractors for different content types

3

Enrich Documents

Apply taxonomies (joins) and clustering (groups) to categorize and group related content

4

Retrieve & Analyze

Search and retrieve content using advanced multimodal search capabilities

Getting Started

The fastest way to start using Mixpeek is to follow our Quickstart Guide which will walk you through setting up your first project.

For a deeper understanding of how Mixpeek works, check out our Core Concepts page.

Common Use Cases

Cross-Modal Search Operations

Organize and store content to enable efficient search across different modalities:

Storage Pattern

  • Group related images, videos, and text documents in single objects
  • Store raw files alongside their extracted features
  • Maintain indexes for cross-modal querying

Example Object Structure

{
  "product_id": "shoe_123",
  "blobs": [
    {"type": "image", "url": "front_view.jpg"},
    {"type": "image", "url": "side_view.jpg"},
    {"type": "text", "url": "description.txt"},
    {"type": "video", "url": "rotation.mp4"}
  ]
}

Each use case leverages Mixpeek’s ability to process and understand relationships across different content types - from text and images to video and audio - providing a unified view of your data.

Ready to get started? Create your first project →