Retrievers

Retrievers are the core search components of Mixpeek, providing flexible and powerful ways to search across your multimodal content with customizable pipelines.

Overview

Retrievers in Mixpeek are configurable search pipelines that allow you to search across your processed content using a combination of vector similarity, metadata filtering, and other search techniques. They provide a flexible way to build sophisticated search experiences tailored to your specific use cases. Watch an Intro Video

Define Retriever Query Schema

Specify the structure of queries that your retriever will accept, including required and optional parameters.

Select Stages

Choose which retrieval stages to include in your pipeline, such as vector search, filtering, reranking, or fusion stages.

Configure Inputs and Outputs

For each stage, define how it receives inputs from previous stages and how its outputs will be passed to subsequent stages.

Save Retriever

Save your configured retriever to make it available for queries within your namespace.

Execute Query

Run search operations using your retriever with queries that match the defined schema structure.

Search Pipelines

Create multi-stage search pipelines that combine different search techniques

Multimodal Retrieval

Search across text, images, videos, and other content types seamlessly

Key Concepts

Schema Validation

Pipeline Stages

Stage Types

Input/Output Flow

Query Parameters

Retriever Architecture

Creating a Basic Retriever

from mixpeek import Mixpeek

mp = Mixpeek(api_key="YOUR_API_KEY")

# Create a basic retriever with a single vector search stage
retriever = mp.retrievers.create(
    namespace_id="ns_abc123",
    name="simple-search",
    description="Basic text search across documents",
    stages=[
        {
            "name": "vector_search",
            "type": "vector",
            "collection_id": "col_def456",
            "index": "text",
            "limit": 20
        }
    ]
)

retriever_id = retriever["retriever_id"]
print(f"Created retriever: {retriever_id}")

Searching with a Retriever

Once you’ve created a retriever, you can use it to search your content:

# Search using a text query
results = mp.retrievers.execute(
    retriever_id=retriever_id,
    query={
        "text": "machine learning algorithms for image classification"
    }
)

# Display results
for result in results["results"]:
    print(f"Document: {result['document_id']}")
    print(f"Score: {result['score']}")
    print(f"Content: {result.get('title', 'N/A')}")
    print("---")

Query Parameters

Different retriever stages can utilize different query parameters:

Text Queries

Image Queries

Hybrid Queries

Vector Queries

Retriever Use Cases

Content-Based Semantic Search

Retrieve documents based on meaning rather than exact keyword matches:Implementation Pattern

Use embedding-based retrievers for semantic understanding
Optimize for capturing conceptual relationships
Configure appropriate similarity thresholds

Filters and Query Operators

Numeric and Date Comparisons

Operators for comparing numeric values and dates:Available Operators

eq - Equal to
neq - Not equal to
gt - Greater than
gte - Greater than or equal to
lt - Less than
lte - Less than or equal to
between - Within range (inclusive)

Example Usage

filters = {
  "price": {"lt": 100},
  "rating": {"gte": 4.5},
  "created_at": {"between": ["2023-01-01", "2023-12-31"]}
}

Best Practices

Start Simple

Begin with a simple retriever design and add complexity as needed. Often a basic vector search with filtering is sufficient.

Use Appropriate Indexes

Choose the right vector indexes for your content type. Use “text” for text-heavy content, “multimodal” for mixed content, and “image” for visual search.

Pre-filter When Possible

Apply metadata filters early in the pipeline to reduce the number of documents that need vector similarity calculation.

Mind Your Limits

Set appropriate limits at each stage. Start with larger limits in early stages and narrow down in later stages.

Leverage Caching

Use caching to improve performance for frequently accessed queries. See the Caching documentation for details.

Complex retrievers with many stages can impact search latency. Start with a simple design and add complexity only when needed for your use case.

Retrievers vs Direct Document Queries

When to Use Retrievers

Semantic search based on meaning
Multimodal search across different content types
Complex search pipelines with multiple stages
When relevance ranking is important

When to Use Document Queries

Simple metadata filtering
Exact match requirements
When performance is critical for simple queries
For administrative operations

API Reference

For complete details on working with retrievers, see our Retrievers API Reference.

Overview

Data Management

Data Processing

Search & Retrieval

Data Enrichment

Troubleshooting

Overview

Search Pipelines

Multimodal Retrieval

Key Concepts

Retriever Architecture

Creating a Basic Retriever

Searching with a Retriever

Query Parameters

Retriever Use Cases

Content-Based Semantic Search

Filters and Query Operators

Numeric and Date Comparisons

Best Practices

Retrievers vs Direct Document Queries

When to Use Retrievers

When to Use Document Queries

API Reference

Overview

Data Management

Data Processing

Search & Retrieval

Data Enrichment

Troubleshooting

​Overview

Search Pipelines

Multimodal Retrieval

​Key Concepts

​Retriever Architecture

​Creating a Basic Retriever

​Searching with a Retriever

​Query Parameters

​Retriever Use Cases

​Content-Based Semantic Search

​Filters and Query Operators

​Numeric and Date Comparisons

​Best Practices

​Retrievers vs Direct Document Queries

When to Use Retrievers

When to Use Document Queries

​API Reference

Overview

Key Concepts

Retriever Architecture

Creating a Basic Retriever

Searching with a Retriever

Query Parameters

Retriever Use Cases

Content-Based Semantic Search

Filters and Query Operators

Numeric and Date Comparisons

Best Practices

Retrievers vs Direct Document Queries

API Reference