Skip to main content
Filters let you narrow results returned by earlier retriever stages. They operate on document payloads (metadata, enrichments, passthrough fields) and can be applied either directly in the retriever execution payload or as dedicated filter@v1 stages.

Filter Strategies

StrategyWhen to UseParameters
structuredFast field-based filtering using comparison operatorsstructured_filter (supports AND, OR, NOT, eq, lt, gte, in, etc.)
textConvert natural language descriptions into structured predicatestext_filter, optional target_fields
llmAsk an LLM to judge whether documents match free-form criteriallm_filter (prompt, model, threshold, batch size)
customExecute a bespoke Python expression (careful with performance)function_code (stringified lambda)
Example stage:
{
  "stage_name": "filter",
  "version": "v1",
  "parameters": {
    "strategy": "structured",
    "structured_filter": {
      "AND": [
        { "field": "metadata.category", "operator": "eq", "value": "audio" },
        { "field": "metadata.price", "operator": "lte", "value": "{{INPUT.max_price}}" }
      ]
    }
  }
}

Filter Operators

OperatorDescription
eq, ne, gt, gte, lt, lteComparison
in, ninMembership
exists, is_nullPresence checks
contains, starts_with, ends_with, regexText predicates
AND, OR, NOTLogical composition (nestable)
Case sensitivity defaults to false; set "case_sensitive": true on the filter payload when needed.

Where Filters Live

  1. Retriever Execution Payload
    Add a filters object directly in the execute call. The filter runs before the stage pipeline, reducing the initial candidate set.
  2. Filter Stage
    Insert filter@v1 between other stages to operate on intermediate results (after KNN, before rerank, etc.).
Use input templates to reference request inputs or prior stage outputs:
{
  "field": "metadata.category",
  "operator": "eq",
  "value": "{{INPUT.category}}"
}

LLM Filter Example

{
  "stage_name": "filter",
  "version": "v1",
  "parameters": {
    "strategy": "llm",
    "llm_filter": {
      "instruction": "Return true only if the review mentions battery life issues.",
      "model": "gpt-4o-mini",
      "batch_size": "{{5 * INPUT.page_size}}",
      "threshold": 0.7
    }
  }
}
  • Documents are grouped into batches and evaluated asynchronously.
  • Stage statistics expose token usage and latency so you can monitor cost.

Best Practices

  1. Filter early to shrink the candidate set before expensive stages (rerank, joins, LLM transforms).
  2. Index frequently filtered fields using collection payload indexes to speed up structured filters.
  3. Mix strategies—use structured filters for hard constraints and LLM filters for fuzzy criteria.
  4. Avoid over-filtering by leaving room for semantic recall; reranking can handle final ordering.
  5. Use templates to drive filter values from query inputs, stage outputs, or runtime context.
Filters are a powerful complement to vector search. Combine them with Mixpeek’s caching and telemetry to keep retrieval precise without sacrificing performance.