Filters allow you to narrow down search results based on document metadata, enhancing search precision by combining semantic search with exact metadata matching.

Overview

Filters in Mixpeek enable you to refine search results by applying conditions to document metadata fields. They complement semantic vector search by allowing precise matching on structured data, such as dates, categories, numerical values, and tags.

Metadata Filtering

Apply conditions to document metadata fields to narrow down search results

Hybrid Search

Combine semantic similarity with metadata filtering for precise retrieval

Filter Usage

Filters can be used in two main contexts within Mixpeek:

Filter Operators

Mixpeek supports a comprehensive set of filter operators for different data types:
OperatorDescriptionExample
$eqEquals{"field": {"$eq": value}}
$neNot equals{"field": {"$ne": value}}
$gtGreater than{"field": {"$gt": value}}
$gteGreater than or equal{"field": {"$gte": value}}
$ltLess than{"field": {"$lt": value}}
$lteLess than or equal{"field": {"$lte": value}}

Basic Filter Examples

Simple Equality Filter

{
  "category": "technology"
}

Numeric Range Filter

{
  "price": {"$gte": 10, "$lte": 50},
  "rating": {"$gt": 4.0}
}

Date Range Filter

{
  "publish_date": {"$gte": "2023-01-01T00:00:00Z", "$lt": "2024-01-01T00:00:00Z"}
}

Array Contains Filter

{
  "tags": {"$contains": "machine-learning"}
}

Advanced Filter Examples

Logical Combinations

{
  "$and": [
    {
      "category": "electronics"
    },
    {
      "$or": [
        {
          "price": {"$lte": 100}
        },
        {
          "on_sale": true
        }
      ]
    }
  ]
}

Negation

{
  "$not": {
    "status": "out_of_stock"
  }
}

Complex Array Operations

{
  "tags": {
    "$all": ["python", "tutorial"],
    "$size": {"$gte": 3}
  }
}

Using Filters in Retrievers

Filter Stage in a Retriever Pipeline

{
  "stages": [
    {
      "name": "vector_search",
      "type": "vector",
      "collection_id": "col_products",
      "index": "multimodal",
      "limit": 100
    },
    {
      "name": "category_filter",
      "type": "filter",
      "input": "vector_search.results",
      "filter": {
        "category": "electronics",
        "price": {"$lte": 500}
      },
      "limit": 50
    },
    {
      "name": "availability_filter",
      "type": "filter",
      "input": "category_filter.results",
      "filter": {
        "in_stock": true,
        "shipping_days": {"$lte": 3}
      },
      "limit": 20
    }
  ]
}

Creating a Retriever with Filters

from mixpeek import Mixpeek

mp = Mixpeek(api_key="YOUR_API_KEY")

# Create a retriever with multiple filter stages
retriever = mp.retrievers.create(
    namespace_id="ns_abc123",
    name="filtered-product-search",
    description="Product search with category and availability filters",
    stages=[
        {
            "name": "vector_search",
            "type": "vector",
            "collection_id": "col_products",
            "index": "multimodal",
            "limit": 100
        },
        {
            "name": "category_filter",
            "type": "filter",
            "input": "vector_search.results",
            "filter": {
                "category": "electronics",
                "price": {"$lte": 500}
            },
            "limit": 50
        },
        {
            "name": "availability_filter",
            "type": "filter",
            "input": "category_filter.results",
            "filter": {
                "in_stock": True,
                "shipping_days": {"$lte": 3}
            },
            "limit": 20
        }
    ]
)

retriever_id = retriever["retriever_id"]

# Search using the retriever
results = mp.retrievers.search(
    retriever_id=retriever_id,
    query={
        "text": "wireless headphones with noise cancellation"
    }
)

Using Filters in Search Queries

Filters can be applied directly in search queries to filter results at query time:
# Search with query-time filters
results = mp.retrievers.search(
    retriever_id="ret_def456",
    query={
        "text": "wireless headphones"
    },
    filters={
        "brand": {"$in": ["Sony", "Bose", "Sennheiser"]},
        "price": {"$lte": 300},
        "rating": {"$gte": 4.0},
        "features": {"$contains": "noise-cancellation"}
    }
)

Filter Optimization

Pre-filtering vs. Post-filtering

Pre-filtering

When to use:
  • To reduce the dataset size before vector search
  • For filters that can significantly reduce the number of candidates
  • When metadata fields have indexes
Implementation: Apply filters directly in the search query or in an early filter stage before vector search

Post-filtering

When to use:
  • After vector search to refine semantically relevant results
  • For more complex filters or combinations
  • When vector similarity is the primary ranking factor
Implementation: Apply filters in a stage after vector search in the retriever pipeline

Indexing for Filters

For optimal filter performance, ensure that frequently filtered fields are properly indexed in your collections. Index the following types of fields:

Best Practices

1

Filter Early

Apply filters as early as possible in the pipeline to reduce the dataset size before more expensive operations.
2

Index Key Fields

Ensure fields used frequently in filters are properly indexed in your collections.
3

Use Precise Filters

Be as specific as possible with filter criteria to narrow down results effectively.
4

Avoid Over-Filtering

Balance filter specificity with result diversity. Overly restrictive filters may eliminate potentially relevant results.
Complex filters with many nested logical operations can impact query performance. When possible, simplify filters and ensure indexed fields are used for optimal performance.

Common Use Cases

E-commerce Product Filtering

{
  "query": {
    "text": "running shoes"
  },
  "filters": {
    "$and": [
      {
        "category": "footwear"
      },
      {
        "brand": {"$in": ["Nike", "Adidas", "New Balance"]}
      },
      {
        "price": {"$gte": 50, "$lte": 150}
      },
      {
        "size": {"$in": [9, 9.5, 10]}
      },
      {
        "color": {"$in": ["black", "blue", "gray"]}
      },
      {
        "rating": {"$gte": 4.0}
      },
      {
        "in_stock": true
      }
    ]
  }
}

Content Filtering

{
  "query": {
    "text": "machine learning tutorials"
  },
  "filters": {
    "$and": [
      {
        "content_type": {"$in": ["article", "video"]}
      },
      {
        "publish_date": {"$gte": "2023-01-01T00:00:00Z"}
      },
      {
        "duration_minutes": {"$lte": 30}
      },
      {
        "difficulty_level": {"$in": ["beginner", "intermediate"]}
      },
      {
        "tags": {"$contains": "python"}
      }
    ]
  }
}

User-Specific Filtering

{
  "query": {
    "text": "data visualization techniques"
  },
  "filters": {
    "$and": [
      {
        "$or": [
          {
            "access_level": "public"
          },
          {
            "allowed_user_ids": {"$contains": "user_123"}
          }
        ]
      },
      {
        "language": "en"
      },
      {
        "$not": {
          "viewed_by": {"$contains": "user_123"}
        }
      }
    ]
  }
}

API Reference

For complete details on using filters in retrievers and search queries, see our Retrievers API Reference and Search API Reference.