Skip to main content
Quick Links
Mixpeek is a platform for processing and searching multimodal content—text, images, video, audio, and documents. Think of it as a unified API that handles everything from extracting features with ML models to searching across them with semantic queries.

Decompose with Extractors

Break complex objects into semantic layers. A single video becomes searchable transcripts, visual embeddings, scene descriptions, and detected entities—each layer independently queryable.
Decompose with Extractors

Breaking down a video into semantic layers

Example: Upload a meeting recording → Whisper extracts the transcript, CLIP generates visual embeddings per frame, and entity detection identifies speakers and topics. Unlocks:
  • Search within any modality (find spoken words, visual moments, or document sections)
  • Extract structured data from unstructured content
  • Build multiple indexes from a single source file
Learn about Feature Extractors →

Recompose with Retrievers

Reassemble layers based on semantic relevance. Chain search stages, apply filters across modalities, and enrich results—turning decomposed content back into meaningful answers.
Recompose with Retrievers

Chaining search stages to recompose results

Example: Query “product demo with Sarah” → vector search finds relevant transcript segments, face detection filters for Sarah, visual similarity ranks by product imagery, and enrichment adds full context. Unlocks:
  • Multi-stage retrieval pipelines (search → filter → rank → enrich)
  • Cross-modal queries (text query finds video moments)
  • Dynamic result enrichment without re-indexing
Learn about Retrievers →

How It Works

The platform has three main pieces that work together:
1

Ingestion

Store your raw files and data as Objects in Buckets. Objects can be videos, PDFs, images, audio files, or JSON—whatever you’re working with.
2

Processing

Create Collections that run ML models (CLIP, Whisper, LayoutLM, etc.) on your objects. Models extract embeddings and structured data, then store everything as searchable documents.
3

Retrieval

Build Retrievers to search across your documents. Chain multiple search stages together, apply filters, and enrich results—all through configuration, no code.

Example Flow

# 1. Create a bucket and register an object
POST /v1/buckets/{bucket_id}/objects
{ "key_prefix": "/meetings", "blobs": [{ "property": "video", "url": "s3://..." }] }

# 2. Create a collection with feature extractors
POST /v1/collections
{ "collection_name": "meetings", "feature_extractor": { "feature_extractor_name": "video-descriptor", "version": "v1" } }

# 3. Execute a retriever
POST /v1/retrievers/{retriever_id}/execute
{ "inputs": { "query_text": "Q4 roadmap discussion" }, "limit": 10 }

Key Features

Multi-Tenant by Default

Every request includes a namespace header. Keep customers, environments, and projects completely isolated.

Configuration Over Code

Define pipelines, extractors, and retrievers with JSON configs. No infrastructure to manage, no model deployments to worry about.

Enrichment & Discovery

Add taxonomies for classification or clusters for content discovery—both use the same semantic JOIN primitives.

Built for Production

Analytics, caching, webhooks, and monitoring built in. Track every request, A/B test retrieval strategies, and roll back configs when needed.

What You Get

  • No infrastructure work: No Ray clusters to manage, no model serving to configure, no vector DB ops
  • Mix any models: CLIP for images, Whisper for audio, LayoutLM for documents—use them together in the same collection
  • Semantic JOINs: Connect documents across collections using vector similarity instead of foreign keys
  • Complete lineage: Trace any result back through document → object → source file

Next Steps