The Feature Search stage is the primary search stage for retrieval pipelines. It performs vector similarity search across one or more embedding features, supporting single-modal, multimodal, and hybrid search patterns. Results from multiple searches are fused using configurable strategies (RRF, DBSF, weighted, max, or learned).
Stage Category : FILTER (Retrieves documents)Transformation : 0 documents → N documents (retrieves from collection based on vector similarity)
When to Use
Use Case Description Semantic search Find documents similar in meaning to a query Image search Search by image embeddings Video search Search by video frame embeddings Multimodal search Combine text + image + video in one query Hybrid search Fuse results from multiple embedding types Decompose/recompose Group results by parent document Faceted search Get result counts by field values
When NOT to Use
Scenario Recommended Alternative Exact field matching attribute_filterFull-text keyword search Combine with text features No embeddings in collection attribute_filterPost-search filtering only Use after feature_search
Core Concepts
Feature URIs
Feature URIs identify which embedding index to search. They follow the pattern:
mixpeek://{extractor_name}@{version}/{output_name}
Examples:
mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding - Multimodal text/image embeddings
mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1 - Text-only embeddings
mixpeek://image_extractor@v1/embedding - Image embeddings
mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding - Video frame embeddings
Fusion Strategies
When searching multiple features, results are combined using fusion:
Strategy Description Best For rrfReciprocal Rank Fusion General purpose, balanced results dbsfDistribution-Based Score Fusion When scores have different distributions weightedWeighted combination When you know relative importance maxMaximum score wins When any match is sufficient learnedML-based fusion Optimized from interaction data
Parameters
Parameter Type Default Description searchesarray Required Array of search configurations final_top_kinteger 25Total results to return after fusion fusionstring rrfFusion strategy for multi-search group_byobject nullGroup results by field facetsarray nullFields to compute facet counts
Search Object Parameters
Each item in the searches array supports:
Parameter Type Default Description feature_uristring Required Embedding index to search querystring/object Required Query text or embedding top_kinteger 100Candidates per search filtersobject nullPre-filter conditions weightnumber 1.0Weight for fusion (weighted strategy)
Configuration Examples
Basic Text Search
Image Search
Multimodal Hybrid Search
Weighted Fusion
With Pre-Filters
With Grouping (Decompose/Recompose)
With Facets
{
"stage_type" : "filter" ,
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{
"feature_uri" : "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding" ,
"query" : "{{INPUT.query}}" ,
"top_k" : 100
}
],
"final_top_k" : 25
}
}
Grouping (Decompose/Recompose)
When documents are decomposed into chunks (e.g., video frames, document pages), use group_by to recompose results by parent:
{
"group_by" : {
"field" : "metadata.parent_id" ,
"limit" : 10 ,
"group_size" : 3
}
}
Parameter Description fieldField to group by (e.g., parent document ID) limitMaximum number of groups to return group_sizeMaximum documents per group
Use cases:
Video search: Group frames by video, return top 3 frames per video
Document search: Group chunks by document, return best chunks per doc
Product search: Group variants by product family
Faceted Search
Get counts of results by field values for building filter UIs:
{
"facets" : [ "metadata.category" , "metadata.brand" , "metadata.price_range" ]
}
Response includes:
{
"facets" : {
"metadata.category" : [
{ "value" : "electronics" , "count" : 45 },
{ "value" : "clothing" , "count" : 23 }
],
"metadata.brand" : [
{ "value" : "Apple" , "count" : 12 },
{ "value" : "Samsung" , "count" : 8 }
]
}
}
Filter Syntax
Pre-filters use boolean logic with AND/OR/NOT:
{
"filters" : {
"AND" : [
{ "field" : "metadata.status" , "operator" : "eq" , "value" : "active" },
{
"OR" : [
{ "field" : "metadata.category" , "operator" : "eq" , "value" : "tech" },
{ "field" : "metadata.category" , "operator" : "eq" , "value" : "science" }
]
}
]
}
}
Supported Operators
Operator Description Example eqEquals {"field": "status", "operator": "eq", "value": "active"}neNot equals {"field": "status", "operator": "ne", "value": "deleted"}gtGreater than {"field": "price", "operator": "gt", "value": 100}gteGreater than or equal {"field": "rating", "operator": "gte", "value": 4}ltLess than {"field": "age", "operator": "lt", "value": 30}lteLess than or equal {"field": "count", "operator": "lte", "value": 10}inIn array {"field": "category", "operator": "in", "value": ["a", "b"]}ninNot in array {"field": "status", "operator": "nin", "value": ["deleted", "archived"]}containsContains substring {"field": "title", "operator": "contains", "value": "guide"}existsField exists {"field": "metadata.optional", "operator": "exists", "value": true}
Metric Value Latency 10-50ms (single search) Latency 20-80ms (multi-search with fusion) Optimal top_k 100-500 per search Maximum top_k 10,000 per search Fusion overhead < 5ms
For best performance, use pre-filters to reduce the search space. Filtering at the vector index level is much faster than post-filtering in later stages.
Common Pipeline Patterns
Basic Search + Rerank
[
{
"stage_type" : "filter" ,
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{
"feature_uri" : "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding" ,
"query" : "{{INPUT.query}}" ,
"top_k" : 100
}
],
"final_top_k" : 50
}
},
{
"stage_type" : "sort" ,
"stage_id" : "rerank" ,
"parameters" : {
"model" : "bge-reranker-v2-m3" ,
"top_n" : 10
}
}
]
Multimodal Search + Filter + Limit
[
{
"stage_type" : "filter" ,
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{
"feature_uri" : "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding" ,
"query" : "{{INPUT.query}}" ,
"top_k" : 100
},
{
"feature_uri" : "mixpeek://image_extractor@v1/embedding" ,
"query" : "{{INPUT.image}}" ,
"top_k" : 100
}
],
"fusion" : "rrf" ,
"final_top_k" : 50
}
},
{
"stage_type" : "filter" ,
"stage_id" : "attribute_filter" ,
"parameters" : {
"field" : "metadata.in_stock" ,
"operator" : "eq" ,
"value" : true
}
},
{
"stage_type" : "reduce" ,
"stage_id" : "sample" ,
"parameters" : {
"limit" : 20
}
}
]
Video Search with Frame Grouping
[
{
"stage_type" : "filter" ,
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{
"feature_uri" : "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding" ,
"query" : "{{INPUT.query}}" ,
"top_k" : 500
}
],
"group_by" : {
"field" : "metadata.video_id" ,
"limit" : 10 ,
"group_size" : 5
},
"final_top_k" : 50
}
},
{
"stage_type" : "reduce" ,
"stage_id" : "summarize" ,
"parameters" : {
"model" : "gpt-4o-mini" ,
"prompt" : "Summarize why these video segments match the query"
}
}
]
E-commerce Search with Facets
[
{
"stage_type" : "filter" ,
"stage_id" : "feature_search" ,
"parameters" : {
"searches" : [
{
"feature_uri" : "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding" ,
"query" : "{{INPUT.query}}" ,
"top_k" : 200 ,
"filters" : {
"AND" : [
{ "field" : "metadata.in_stock" , "operator" : "eq" , "value" : true },
{ "field" : "metadata.price" , "operator" : "lte" , "value" : "{{INPUT.max_price}}" }
]
}
}
],
"facets" : [ "metadata.category" , "metadata.brand" , "metadata.color" ],
"final_top_k" : 50
}
},
{
"stage_type" : "sort" ,
"stage_id" : "sort_attribute" ,
"parameters" : {
"sort_field" : "{{INPUT.sort_by}}" ,
"order" : "{{INPUT.sort_order}}"
}
}
]
Output Schema
Each result includes:
Field Type Description document_idstring Unique document identifier scorefloat Combined similarity score contentstring Document content metadataobject Document metadata featuresobject Feature data and scores per search
Example output:
{
"document_id" : "doc_abc123" ,
"score" : 0.892 ,
"content" : "Document content here..." ,
"metadata" : {
"title" : "Example Document" ,
"category" : "tech" ,
"created_at" : "2024-01-15T10:30:00Z"
},
"features" : {
"mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding" : {
"score" : 0.91
},
"mixpeek://image_extractor@v1/embedding" : {
"score" : 0.87
}
}
}
Comparison: feature_search vs attribute_filter
Aspect feature_search attribute_filter Purpose Semantic similarity Exact matching Input Query text/embedding Field conditions Scoring Vector similarity Binary match Speed 10-50ms 5-20ms Use when Finding similar content Filtering by metadata
Error Handling
Error Behavior Invalid feature_uri Stage fails with error Empty query Returns empty results Filter syntax error Stage fails with error No matching documents Returns empty results