The Hybrid Search retriever stage combines semantic vector search with keyword-based search to balance precision and recall.

Overview

Hybrid Search combines the strengths of both semantic (vector) and lexical (keyword) search methods. This approach leverages vector embeddings for understanding context and meaning, while using keyword matching for precision with specific terms. The combined approach provides more robust search results than either method alone.

Required Inputs

ParameterTypeRequiredDefaultDescription
querystringYes-The search query text
kintegerNo10Number of results to retrieve
feature_store_idstringYes-ID of the feature store containing vector embeddings
index_idstringYes-ID of the keyword index
vector_weightfloatNo0.5Weight given to vector search results (0.0-1.0)
keyword_weightfloatNo0.5Weight given to keyword search results (0.0-1.0)

Configurations

Search Weighting

The hybrid search combines results from both methods using a weighted approach:

ParameterDescriptionImpact
vector_weightWeight assigned to vector search resultsHigher values favor semantic similarity
keyword_weightWeight assigned to keyword search resultsHigher values favor exact keyword matches

Merging Methods

MethodDescriptionUse Case
linear_combinationWeighted average of both search scoresGeneral purpose, balanced approach
reciprocal_rank_fusionCombines result rankings rather than scoresWhen score scales differ significantly
cross_encoder_rerankingUses a model to rerank combined resultsWhen highest precision is required

Configuration Examples

Basic Hybrid Search
{
  "k": 10,
  "vector_weight": 0.6,
  "keyword_weight": 0.4,
  "merging_method": "linear_combination",
  "feature_store_id": "fs_embeddings_123",
  "index_id": "idx_docs_456"
}
Advanced Configuration
{
  "k": 25,
  "vector_weight": 0.7,
  "keyword_weight": 0.3,
  "merging_method": "cross_encoder_reranking",
  "reranker_model": "mixpeek/reranker-v1",
  "feature_store_id": "fs_embeddings_123",
  "index_id": "idx_docs_456",
  "min_score": 0.2,
  "vector_k": 50,
  "keyword_k": 50
}

Advanced Options

OptionTypeDefaultDescription
min_scorefloat0.1Minimum combined score threshold for results
vector_kintegerk * 3Number of candidates to retrieve from vector search
keyword_kintegerk * 3Number of candidates to retrieve from keyword search
reranker_modelstringnullModel identifier for cross-encoder reranking

Processing Flow

Output Schema

{
  "results": [
    {
      "document_id": "doc_abc123",
      "collection_id": "col_xyz789",
      "combined_score": 0.875,
      "vector_score": 0.923,
      "keyword_score": 0.803,
      "metadata": {
        "title": "Hybrid Search Systems",
        "timestamp": "2023-06-12T10:15:43Z"
      },
      "content": "Hybrid search systems combine the strengths of multiple retrieval methods..."
    },
    {
      "document_id": "doc_def456",
      "collection_id": "col_xyz789",
      "combined_score": 0.842,
      "vector_score": 0.791,
      "keyword_score": 0.919,
      "metadata": {
        "title": "Implementing Keyword and Vector Search",
        "timestamp": "2023-07-04T16:30:22Z"
      },
      "content": "When implementing a search system, combining keyword and vector approaches..."
    }
    // Additional results...
  ],
  "metadata": {
    "query": "hybrid search implementation techniques",
    "total_results": 2,
    "processing_time_ms": 38.7,
    "vector_weight": 0.6,
    "keyword_weight": 0.4,
    "merging_method": "linear_combination"
  }
}