Skip to main content
Feature Search stage showing multi-vector semantic search with fusion
The Feature Search stage is the primary search stage for retrieval pipelines. It performs vector similarity search across one or more embedding features, supporting single-modal, multimodal, and hybrid search patterns. Results from multiple searches are fused using configurable strategies (RRF, DBSF, weighted, max, or learned).
Stage Category: FILTER (Retrieves documents)Transformation: 0 documents → N documents (retrieves from collection based on vector similarity)

When to Use

Use CaseDescription
Semantic searchFind documents similar in meaning to a query
Image searchSearch by image embeddings
Video searchSearch by video frame embeddings
Multimodal searchCombine text + image + video in one query
Hybrid searchFuse results from multiple embedding types
Decompose/recomposeGroup results by parent document
Faceted searchGet result counts by field values

When NOT to Use

ScenarioRecommended Alternative
Exact field matchingattribute_filter
Full-text keyword searchCombine with text features
No embeddings in collectionattribute_filter
Post-search filtering onlyUse after feature_search

Core Concepts

Feature URIs

Feature URIs identify which embedding index to search. They follow the pattern:
mixpeek://{extractor_name}@{version}/{output_name}
Examples:
  • mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding - Multimodal text/image embeddings
  • mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1 - Text-only embeddings
  • mixpeek://image_extractor@v1/embedding - Image embeddings
  • mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding - Video frame embeddings

Fusion Strategies

When searching multiple features, results are combined using fusion:
StrategyDescriptionBest For
rrfReciprocal Rank FusionGeneral purpose, balanced results
dbsfDistribution-Based Score FusionWhen scores have different distributions
weightedWeighted combinationWhen you know relative importance
maxMaximum score winsWhen any match is sufficient
learnedML-based fusionOptimized from interaction data

Parameters

ParameterTypeDefaultDescription
searchesarrayRequiredArray of search configurations
final_top_kinteger25Total results to return after fusion
fusionstringrrfFusion strategy for multi-search
group_byobjectnullGroup results by field
facetsarraynullFields to compute facet counts

Search Object Parameters

Each item in the searches array supports:
ParameterTypeDefaultDescription
feature_uristringRequiredEmbedding index to search
querystring/objectRequiredQuery text or embedding
top_kinteger100Candidates per search
filtersobjectnullPre-filter conditions
weightnumber1.0Weight for fusion (weighted strategy)

Configuration Examples

{
  "stage_type": "filter",
  "stage_id": "feature_search",
  "parameters": {
    "searches": [
      {
        "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
        "query": "{{INPUT.query}}",
        "top_k": 100
      }
    ],
    "final_top_k": 25
  }
}

Grouping (Decompose/Recompose)

When documents are decomposed into chunks (e.g., video frames, document pages), use group_by to recompose results by parent:
{
  "group_by": {
    "field": "metadata.parent_id",
    "limit": 10,
    "group_size": 3
  }
}
ParameterDescription
fieldField to group by (e.g., parent document ID)
limitMaximum number of groups to return
group_sizeMaximum documents per group
Use cases:
  • Video search: Group frames by video, return top 3 frames per video
  • Document search: Group chunks by document, return best chunks per doc
  • Product search: Group variants by product family
Get counts of results by field values for building filter UIs:
{
  "facets": ["metadata.category", "metadata.brand", "metadata.price_range"]
}
Response includes:
{
  "facets": {
    "metadata.category": [
      {"value": "electronics", "count": 45},
      {"value": "clothing", "count": 23}
    ],
    "metadata.brand": [
      {"value": "Apple", "count": 12},
      {"value": "Samsung", "count": 8}
    ]
  }
}

Filter Syntax

Pre-filters use boolean logic with AND/OR/NOT:
{
  "filters": {
    "AND": [
      {"field": "metadata.status", "operator": "eq", "value": "active"},
      {
        "OR": [
          {"field": "metadata.category", "operator": "eq", "value": "tech"},
          {"field": "metadata.category", "operator": "eq", "value": "science"}
        ]
      }
    ]
  }
}

Supported Operators

OperatorDescriptionExample
eqEquals{"field": "status", "operator": "eq", "value": "active"}
neNot equals{"field": "status", "operator": "ne", "value": "deleted"}
gtGreater than{"field": "price", "operator": "gt", "value": 100}
gteGreater than or equal{"field": "rating", "operator": "gte", "value": 4}
ltLess than{"field": "age", "operator": "lt", "value": 30}
lteLess than or equal{"field": "count", "operator": "lte", "value": 10}
inIn array{"field": "category", "operator": "in", "value": ["a", "b"]}
ninNot in array{"field": "status", "operator": "nin", "value": ["deleted", "archived"]}
containsContains substring{"field": "title", "operator": "contains", "value": "guide"}
existsField exists{"field": "metadata.optional", "operator": "exists", "value": true}

Performance

MetricValue
Latency10-50ms (single search)
Latency20-80ms (multi-search with fusion)
Optimal top_k100-500 per search
Maximum top_k10,000 per search
Fusion overhead< 5ms
For best performance, use pre-filters to reduce the search space. Filtering at the vector index level is much faster than post-filtering in later stages.

Common Pipeline Patterns

Basic Search + Rerank

[
  {
    "stage_type": "filter",
    "stage_id": "feature_search",
    "parameters": {
      "searches": [
        {
          "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
          "query": "{{INPUT.query}}",
          "top_k": 100
        }
      ],
      "final_top_k": 50
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "rerank",
    "parameters": {
      "model": "bge-reranker-v2-m3",
      "top_n": 10
    }
  }
]

Multimodal Search + Filter + Limit

[
  {
    "stage_type": "filter",
    "stage_id": "feature_search",
    "parameters": {
      "searches": [
        {
          "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
          "query": "{{INPUT.query}}",
          "top_k": 100
        },
        {
          "feature_uri": "mixpeek://image_extractor@v1/embedding",
          "query": "{{INPUT.image}}",
          "top_k": 100
        }
      ],
      "fusion": "rrf",
      "final_top_k": 50
    }
  },
  {
    "stage_type": "filter",
    "stage_id": "attribute_filter",
    "parameters": {
      "field": "metadata.in_stock",
      "operator": "eq",
      "value": true
    }
  },
  {
    "stage_type": "reduce",
    "stage_id": "sample",
    "parameters": {
      "limit": 20
    }
  }
]

Video Search with Frame Grouping

[
  {
    "stage_type": "filter",
    "stage_id": "feature_search",
    "parameters": {
      "searches": [
        {
          "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
          "query": "{{INPUT.query}}",
          "top_k": 500
        }
      ],
      "group_by": {
        "field": "metadata.video_id",
        "limit": 10,
        "group_size": 5
      },
      "final_top_k": 50
    }
  },
  {
    "stage_type": "reduce",
    "stage_id": "summarize",
    "parameters": {
      "model": "gpt-4o-mini",
      "prompt": "Summarize why these video segments match the query"
    }
  }
]

E-commerce Search with Facets

[
  {
    "stage_type": "filter",
    "stage_id": "feature_search",
    "parameters": {
      "searches": [
        {
          "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
          "query": "{{INPUT.query}}",
          "top_k": 200,
          "filters": {
            "AND": [
              {"field": "metadata.in_stock", "operator": "eq", "value": true},
              {"field": "metadata.price", "operator": "lte", "value": "{{INPUT.max_price}}"}
            ]
          }
        }
      ],
      "facets": ["metadata.category", "metadata.brand", "metadata.color"],
      "final_top_k": 50
    }
  },
  {
    "stage_type": "sort",
    "stage_id": "sort_attribute",
    "parameters": {
      "sort_field": "{{INPUT.sort_by}}",
      "order": "{{INPUT.sort_order}}"
    }
  }
]

Output Schema

Each result includes:
FieldTypeDescription
document_idstringUnique document identifier
scorefloatCombined similarity score
contentstringDocument content
metadataobjectDocument metadata
featuresobjectFeature data and scores per search
Example output:
{
  "document_id": "doc_abc123",
  "score": 0.892,
  "content": "Document content here...",
  "metadata": {
    "title": "Example Document",
    "category": "tech",
    "created_at": "2024-01-15T10:30:00Z"
  },
  "features": {
    "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding": {
      "score": 0.91
    },
    "mixpeek://image_extractor@v1/embedding": {
      "score": 0.87
    }
  }
}

Comparison: feature_search vs attribute_filter

Aspectfeature_searchattribute_filter
PurposeSemantic similarityExact matching
InputQuery text/embeddingField conditions
ScoringVector similarityBinary match
Speed10-50ms5-20ms
Use whenFinding similar contentFiltering by metadata

Error Handling

ErrorBehavior
Invalid feature_uriStage fails with error
Empty queryReturns empty results
Filter syntax errorStage fails with error
No matching documentsReturns empty results