Filters

Filters let you narrow results returned by earlier retriever stages. They operate on document payloads (metadata, enrichments, passthrough fields) and can be applied either directly in the retriever execution payload or as dedicated filter@v1 stages.

Filter Strategies

Strategy	When to Use	Parameters
`structured`	Fast field-based filtering using comparison operators	`structured_filter` (supports `AND`, `OR`, `NOT`, `eq`, `lt`, `gte`, `in`, etc.)
`text`	Convert natural language descriptions into structured predicates	`text_filter`, optional `target_fields`
`llm`	Ask an LLM to judge whether documents match free-form criteria	`llm_filter` (prompt, model, threshold, batch size)
`custom`	Execute a bespoke Python expression (careful with performance)	`function_code` (stringified lambda)

Example stage:

{
  "stage_name": "filter",
  "version": "v1",
  "parameters": {
    "strategy": "structured",
    "structured_filter": {
      "AND": [
        { "field": "metadata.category", "operator": "eq", "value": "audio" },
        { "field": "metadata.price", "operator": "lte", "value": "{{INPUT.max_price}}" }
      ]
    }
  }
}

Filter Operators

Operator	Description
`eq`, `ne`, `gt`, `gte`, `lt`, `lte`	Comparison
`in`, `nin`	Membership
`exists`, `is_null`	Presence checks
`contains`, `starts_with`, `ends_with`, `regex`	Text predicates
`AND`, `OR`, `NOT`	Logical composition (nestable)

Case sensitivity defaults to false; set "case_sensitive": true on the filter payload when needed.

Where Filters Live

Retriever Execution Payload
Add a filters object directly in the execute call. The filter runs before the stage pipeline, reducing the initial candidate set.
Filter Stage
Insert filter@v1 between other stages to operate on intermediate results (after KNN, before rerank, etc.).

Use input templates to reference request inputs or prior stage outputs:

{
  "field": "metadata.category",
  "operator": "eq",
  "value": "{{INPUT.category}}"
}

LLM Filter Example

{
  "stage_name": "filter",
  "version": "v1",
  "parameters": {
    "strategy": "llm",
    "llm_filter": {
      "instruction": "Return true only if the review mentions battery life issues.",
      "model": "gpt-4o-mini",
      "batch_size": "{{5 * INPUT.page_size}}",
      "threshold": 0.7
    }
  }
}

Documents are grouped into batches and evaluated asynchronously.
Stage statistics expose token usage and latency so you can monitor cost.

Best Practices

Filter early to shrink the candidate set before expensive stages (rerank, joins, LLM transforms).
Index frequently filtered fields using collection payload indexes to speed up structured filters.
Mix strategies—use structured filters for hard constraints and LLM filters for fuzzy criteria.
Avoid over-filtering by leaving room for semantic recall; reranking can handle final ordering.
Use templates to drive filter values from query inputs, stage outputs, or runtime context.

Filters are a powerful complement to vector search. Combine them with Mixpeek’s caching and telemetry to keep retrieval precise without sacrificing performance.

Getting Started

Ingest Data

Process Data

Search & Retrieve

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

Filter Strategies

Filter Operators

Where Filters Live

LLM Filter Example

Best Practices

Getting Started

Ingest Data

Process Data

Search & Retrieve

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

​Filter Strategies

​Filter Operators

​Where Filters Live

​LLM Filter Example

​Best Practices

Filter Strategies

Filter Operators

Where Filters Live

LLM Filter Example

Best Practices