Skip to main content
Attribute Filter stage showing metadata-based document filtering
The Attribute Filter stage filters documents based on metadata field conditions. It supports simple single-field filtering and complex boolean logic (AND/OR/NOT). When used as a first stage, it retrieves documents directly from the database; when used after other stages, it filters in-memory results.
Stage Category: FILTER (Reduces document set)Transformation: N documents → M documents (where M ≤ N, based on conditions)

When to Use

Use CaseDescription
Metadata filteringFilter by status, category, date, etc.
Post-search refinementNarrow semantic search results
Access controlFilter by user permissions
Business logicActive items, published content
Initial retrievalFetch documents by attributes (no embeddings)

When NOT to Use

ScenarioRecommended Alternative
Semantic similarityfeature_search
Content-based filteringllm_filter
Complex text matchingfeature_search with text features
Scoring/rankingUse sort stages after filtering

Parameters

Simple Mode

Use for single-condition filtering:
ParameterTypeDefaultDescription
fieldstringRequiredField path to filter on
operatorstringeqComparison operator
valueanyRequiredValue to compare against
case_insensitivebooleanfalseCase-insensitive string matching

Boolean Mode

Use for complex multi-condition filtering:
ParameterTypeDefaultDescription
conditionsobjectRequiredBoolean condition object
batch_sizeinteger100Documents per batch (first-stage only)

Supported Operators

OperatorDescriptionExample Value
eqEquals"active", 42, true
neNot equals"deleted"
gtGreater than100
gteGreater than or equal4.5
ltLess than50
lteLess than or equal10
inIn array["tech", "science"]
ninNot in array["spam", "deleted"]
containsContains substring"guide"
starts_withStarts with"intro"
ends_withEnds with".pdf"
regexRegular expression"^[A-Z].*"
existsField existstrue or false
is_nullField is nulltrue or false
textFull-text search"machine learning"

Configuration Examples

{
  "stage_type": "filter",
  "stage_id": "attribute_filter",
  "parameters": {
    "field": "metadata.status",
    "operator": "eq",
    "value": "published"
  }
}

Boolean Conditions

For complex filtering, use the conditions parameter with AND/OR/NOT logic:
{
  "stage_type": "filter",
  "stage_id": "attribute_filter",
  "parameters": {
    "conditions": {
      "AND": [
        {"field": "metadata.status", "operator": "eq", "value": "active"},
        {"field": "metadata.in_stock", "operator": "eq", "value": true},
        {"field": "metadata.price", "operator": "lte", "value": 500}
      ]
    }
  }
}

First-Stage vs Later-Stage Behavior

PositionBehavior
First stageFetches documents directly from database (up to 1,000 per collection)
Later stageFilters in-memory results from previous stages

First-Stage Example

When no documents exist in the pipeline yet:
[
  {
    "stage_type": "filter",
    "stage_id": "attribute_filter",
    "parameters": {
      "field": "metadata.status",
      "operator": "eq",
      "value": "published",
      "batch_size": 100
    }
  }
]

Later-Stage Example

After semantic search:
[
  {
    "stage_type": "filter",
    "stage_id": "feature_search",
    "parameters": {
      "searches": [{"feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding", "query": "{{INPUT.query}}"}],
      "final_top_k": 100
    }
  },
  {
    "stage_type": "filter",
    "stage_id": "attribute_filter",
    "parameters": {
      "field": "metadata.in_stock",
      "operator": "eq",
      "value": true
    }
  }
]

Performance

MetricValue
Latency5-20ms
First-stage limit1,000 documents per collection
In-memory filtering< 5ms for 1,000 docs
Index utilizationUses indexes when available
For best performance with feature_search, use pre-filters in the search stage instead of a separate attribute_filter stage. Pre-filters are applied at the vector index level.

Common Pipeline Patterns

Search + Filter + Sort

[
  {
    "stage_type": "filter",
    "stage_id": "feature_search",
    "parameters": {
      "searches": [
        {
          "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
          "query": "{{INPUT.query}}",
          "top_k": 100
        }
      ],
      "final_top_k": 50
    }
  },
  {
    "stage_type": "filter",
    "stage_id": "attribute_filter",
    "parameters": {
      "conditions": {
        "AND": [
          {"field": "metadata.status", "operator": "eq", "value": "active"},
          {"field": "metadata.category", "operator": "in", "value": "{{INPUT.categories}}"}
        ]
      }
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "sort_attribute",
    "parameters": {
      "sort_field": "metadata.created_at",
      "order": "desc"
    }
  }
]

Attribute-Only Retrieval (No Embeddings)

[
  {
    "stage_type": "filter",
    "stage_id": "attribute_filter",
    "parameters": {
      "conditions": {
        "AND": [
          {"field": "metadata.type", "operator": "eq", "value": "product"},
          {"field": "metadata.featured", "operator": "eq", "value": true}
        ]
      },
      "batch_size": 50
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "sort_attribute",
    "parameters": {
      "sort_field": "metadata.priority",
      "order": "desc"
    }
  },
  {
    "stage_type": "reduce",
    "stage_id": "sample",
    "parameters": {
      "limit": 10
    }
  }
]

Multi-Stage Filtering

[
  {
    "stage_type": "filter",
    "stage_id": "feature_search",
    "parameters": {
      "searches": [{"feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding", "query": "{{INPUT.query}}"}],
      "final_top_k": 200
    }
  },
  {
    "stage_type": "filter",
    "stage_id": "attribute_filter",
    "parameters": {
      "field": "metadata.category",
      "operator": "eq",
      "value": "{{INPUT.category}}"
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "rerank",
    "parameters": {
      "model": "bge-reranker-v2-m3",
      "top_n": 20
    }
  },
  {
    "stage_type": "filter",
    "stage_id": "attribute_filter",
    "parameters": {
      "field": "metadata.in_stock",
      "operator": "eq",
      "value": true
    }
  }
]

Comparison: attribute_filter vs feature_search Pre-Filters

Aspectattribute_filter (stage)feature_search pre-filters
When appliedAfter search resultsDuring vector search
PerformanceGoodBest (index-level)
Use casePost-filtering, first-stage retrievalAlways when possible
FlexibilityCan be placed anywhereOnly with feature_search
Recommendation: When filtering during semantic search, prefer pre-filters in feature_search. Use attribute_filter for:
  • Post-search refinement based on previous stage outputs
  • First-stage attribute-only retrieval
  • Dynamic filters that depend on earlier stage results

Error Handling

ErrorBehavior
Field not foundDocument excluded (treated as no match)
Invalid operatorStage fails with error
Type mismatchAttempts type coercion, then excludes
Invalid regexStage fails with error