The Attribute Filter stage filters documents based on metadata field conditions. It supports simple single-field filtering and complex boolean logic (AND/OR/NOT). When used as a first stage, it retrieves documents directly from the database; when used after other stages, it filters in-memory results.
Stage Category : FILTER (Reduces document set)Transformation : N documents → M documents (where M ≤ N, based on conditions)
When to Use
Use Case Description Metadata filtering Filter by status, category, date, etc. Post-search refinement Narrow semantic search results Access control Filter by user permissions Business logic Active items, published content Initial retrieval Fetch documents by attributes (no embeddings)
When NOT to Use
Scenario Recommended Alternative Semantic similarity feature_searchContent-based filtering llm_filterComplex text matching feature_search with text featuresScoring/ranking Use sort stages after filtering
Parameters
Simple Mode
Use for single-condition filtering:
Parameter Type Default Description fieldstring Required Field path to filter on operatorstring eqComparison operator valueany Required Value to compare against case_insensitiveboolean falseCase-insensitive string matching
Boolean Mode
Use for complex multi-condition filtering:
Parameter Type Default Description conditionsobject Required Boolean condition object batch_sizeinteger 100Documents per batch (first-stage only)
Supported Operators
Operator Description Example Value eqEquals "active", 42, trueneNot equals "deleted"gtGreater than 100gteGreater than or equal 4.5ltLess than 50lteLess than or equal 10inIn array ["tech", "science"]ninNot in array ["spam", "deleted"]containsContains substring "guide"starts_withStarts with "intro"ends_withEnds with ".pdf"regexRegular expression "^[A-Z].*"existsField exists true or falseis_nullField is null true or falsetextFull-text search "machine learning"
Configuration Examples
Simple Equality
Numeric Comparison
In Array
Contains Substring
Field Exists
Dynamic Value
{
"stage_type" : "filter" ,
"stage_id" : "attribute_filter" ,
"parameters" : {
"field" : "metadata.status" ,
"operator" : "eq" ,
"value" : "published"
}
}
Boolean Conditions
For complex filtering, use the conditions parameter with AND/OR/NOT logic:
AND Conditions
OR Conditions
NOT Conditions
Nested Boolean Logic
{
"stage_type" : "filter" ,
"stage_id" : "attribute_filter" ,
"parameters" : {
"conditions" : {
"AND" : [
{ "field" : "metadata.status" , "operator" : "eq" , "value" : "active" },
{ "field" : "metadata.in_stock" , "operator" : "eq" , "value" : true },
{ "field" : "metadata.price" , "operator" : "lte" , "value" : 500 }
]
}
}
}
First-Stage vs Later-Stage Behavior
Position Behavior First stage Fetches documents directly from database (up to 1,000 per collection) Later stage Filters in-memory results from previous stages
First-Stage Example
When no documents exist in the pipeline yet:
[
{
"stage_type" : "filter" ,
"stage_id" : "attribute_filter" ,
"parameters" : {
"field" : "metadata.status" ,
"operator" : "eq" ,
"value" : "published" ,
"batch_size" : 100
}
}
]
Later-Stage Example
After semantic search:
[
{
"stage_type" : "filter" ,
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [{ "feature_uri" : "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding" , "query" : "{{INPUT.query}}" }],
"final_top_k" : 100
}
},
{
"stage_type" : "filter" ,
"stage_id" : "attribute_filter" ,
"parameters" : {
"field" : "metadata.in_stock" ,
"operator" : "eq" ,
"value" : true
}
}
]
Metric Value Latency 5-20ms First-stage limit 1,000 documents per collection In-memory filtering < 5ms for 1,000 docs Index utilization Uses indexes when available
For best performance with feature_search, use pre-filters in the search stage instead of a separate attribute_filter stage. Pre-filters are applied at the vector index level.
Common Pipeline Patterns
Search + Filter + Sort
[
{
"stage_type" : "filter" ,
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{
"feature_uri" : "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding" ,
"query" : "{{INPUT.query}}" ,
"top_k" : 100
}
],
"final_top_k" : 50
}
},
{
"stage_type" : "filter" ,
"stage_id" : "attribute_filter" ,
"parameters" : {
"conditions" : {
"AND" : [
{ "field" : "metadata.status" , "operator" : "eq" , "value" : "active" },
{ "field" : "metadata.category" , "operator" : "in" , "value" : "{{INPUT.categories}}" }
]
}
}
},
{
"stage_type" : "sort" ,
"stage_id" : "sort_attribute" ,
"parameters" : {
"sort_field" : "metadata.created_at" ,
"order" : "desc"
}
}
]
Attribute-Only Retrieval (No Embeddings)
[
{
"stage_type" : "filter" ,
"stage_id" : "attribute_filter" ,
"parameters" : {
"conditions" : {
"AND" : [
{ "field" : "metadata.type" , "operator" : "eq" , "value" : "product" },
{ "field" : "metadata.featured" , "operator" : "eq" , "value" : true }
]
},
"batch_size" : 50
}
},
{
"stage_type" : "sort" ,
"stage_id" : "sort_attribute" ,
"parameters" : {
"sort_field" : "metadata.priority" ,
"order" : "desc"
}
},
{
"stage_type" : "reduce" ,
"stage_id" : "sample" ,
"parameters" : {
"limit" : 10
}
}
]
Multi-Stage Filtering
[
{
"stage_type" : "filter" ,
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [{ "feature_uri" : "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding" , "query" : "{{INPUT.query}}" }],
"final_top_k" : 200
}
},
{
"stage_type" : "filter" ,
"stage_id" : "attribute_filter" ,
"parameters" : {
"field" : "metadata.category" ,
"operator" : "eq" ,
"value" : "{{INPUT.category}}"
}
},
{
"stage_type" : "sort" ,
"stage_id" : "rerank" ,
"parameters" : {
"model" : "bge-reranker-v2-m3" ,
"top_n" : 20
}
},
{
"stage_type" : "filter" ,
"stage_id" : "attribute_filter" ,
"parameters" : {
"field" : "metadata.in_stock" ,
"operator" : "eq" ,
"value" : true
}
}
]
Comparison: attribute_filter vs feature_search Pre-Filters
Aspect attribute_filter (stage) feature_search pre-filters When applied After search results During vector search Performance Good Best (index-level) Use case Post-filtering, first-stage retrieval Always when possible Flexibility Can be placed anywhere Only with feature_search
Recommendation: When filtering during semantic search, prefer pre-filters in feature_search. Use attribute_filter for:
Post-search refinement based on previous stage outputs
First-stage attribute-only retrieval
Dynamic filters that depend on earlier stage results
Error Handling
Error Behavior Field not found Document excluded (treated as no match) Invalid operator Stage fails with error Type mismatch Attempts type coercion, then excludes Invalid regex Stage fails with error