Retrievers are the core search components of Mixpeek, providing flexible and powerful ways to search across your multimodal content with customizable pipelines.

Overview

Retrievers in Mixpeek are configurable search pipelines that allow you to search across your processed content using a combination of vector similarity, metadata filtering, and other search techniques.

They provide a flexible way to build sophisticated search experiences tailored to your specific use cases.

1

Define Retriever Query Schema

Specify the structure of queries that your retriever will accept, including required and optional parameters.

2

Select Stages

Choose which retrieval stages to include in your pipeline, such as vector search, filtering, reranking, or fusion stages.

3

Configure Inputs and Outputs

For each stage, define how it receives inputs from previous stages and how its outputs will be passed to subsequent stages.

4

Save Retriever

Save your configured retriever to make it available for queries within your namespace.

5

Execute Query

Run search operations using your retriever with queries that match the defined schema structure.

Search Pipelines

Create multi-stage search pipelines that combine different search techniques

Multimodal Retrieval

Search across text, images, videos, and other content types seamlessly

Key Concepts

Retriever Architecture

Creating a Basic Retriever

from mixpeek import Mixpeek

mp = Mixpeek(api_key="YOUR_API_KEY")

# Create a basic retriever with a single vector search stage
retriever = mp.retrievers.create(
    namespace_id="ns_abc123",
    name="simple-search",
    description="Basic text search across documents",
    stages=[
        {
            "name": "vector_search",
            "type": "vector",
            "collection_id": "col_def456",
            "index": "text",
            "limit": 20
        }
    ]
)

retriever_id = retriever["retriever_id"]
print(f"Created retriever: {retriever_id}")

Searching with a Retriever

Once you’ve created a retriever, you can use it to search your content:

# Search using a text query
results = mp.retrievers.execute(
    retriever_id=retriever_id,
    query={
        "text": "machine learning algorithms for image classification"
    }
)

# Display results
for result in results["results"]:
    print(f"Document: {result['document_id']}")
    print(f"Score: {result['score']}")
    print(f"Content: {result.get('title', 'N/A')}")
    print("---")

Query Parameters

Different retriever stages can utilize different query parameters:

Retriever Use Cases

Content-Based Semantic Search

Retrieve documents based on meaning rather than exact keyword matches:

Implementation Pattern

  • Use embedding-based retrievers for semantic understanding
  • Optimize for capturing conceptual relationships
  • Configure appropriate similarity thresholds

Filters and Query Operators

Numeric and Date Comparisons

Operators for comparing numeric values and dates:

Available Operators

  • eq - Equal to
  • neq - Not equal to
  • gt - Greater than
  • gte - Greater than or equal to
  • lt - Less than
  • lte - Less than or equal to
  • between - Within range (inclusive)

Example Usage

filters = {
  "price": {"lt": 100},
  "rating": {"gte": 4.5},
  "created_at": {"between": ["2023-01-01", "2023-12-31"]}
}

Best Practices

1

Start Simple

Begin with a simple retriever design and add complexity as needed. Often a basic vector search with filtering is sufficient.

2

Use Appropriate Indexes

Choose the right vector indexes for your content type. Use “text” for text-heavy content, “multimodal” for mixed content, and “image” for visual search.

3

Pre-filter When Possible

Apply metadata filters early in the pipeline to reduce the number of documents that need vector similarity calculation.

4

Mind Your Limits

Set appropriate limits at each stage. Start with larger limits in early stages and narrow down in later stages.

Complex retrievers with many stages can impact search latency. Start with a simple design and add complexity only when needed for your use case.

Retrievers vs Direct Document Queries

When to Use Retrievers

  • Semantic search based on meaning
  • Multimodal search across different content types
  • Complex search pipelines with multiple stages
  • When relevance ranking is important

When to Use Document Queries

  • Simple metadata filtering
  • Exact match requirements
  • When performance is critical for simple queries
  • For administrative operations

API Reference

For complete details on working with retrievers, see our Retrievers API Reference.