Skip to main content
Learned fusion automatically discovers the optimal blend of embedding features for your users. Instead of manually setting weights (text: 0.7, image: 0.3), the system learns from interaction data which features produce results users engage with.
Thompson Sampling: Beta distributions evolve from uniform to peaked as interactions accumulate

How It Works

Learned fusion uses Thompson Sampling, a well-studied algorithm for the multi-armed bandit problem. Here’s how it applies to search fusion:
1

Initialize with uniform priors

Each search feature (e.g., text embeddings, image embeddings) starts with a Beta(1, 1) distribution — a flat line that assigns equal probability to all weight values. This means zero assumptions about which feature is better.
2

Sample weights at query time

When a query arrives, the system draws a random weight from each feature’s Beta distribution and normalizes them to sum to 1. Early on, samples are highly variable (exploration). As data accumulates, they stabilize (exploitation).
3

Execute search with sampled weights

The feature search stage runs each embedding search and fuses results using the sampled weights — functionally identical to weighted fusion, but with dynamically chosen weights.
4

Capture user interactions

Users interact with results: clicks, purchases, skips. Each interaction is recorded with the document ID, position, and the context key that identifies which weight sample was used.
5

Update Beta distributions

Positive interactions (clicks, purchases) increment the alpha parameter: alpha = 1 + clicks. Non-engagement increments beta: beta = 1 + (impressions - clicks). This shifts the distribution toward weights that produce engaging results.
6

Repeat with better weights

Next query: the updated distributions produce weight samples closer to what works. After hundreds of interactions, the system converges on near-optimal weights while still occasionally exploring alternatives.

Thompson Sampling Explained

Think of it like flipping weighted coins. Each feature has its own coin:
  • At the start, both coins are fair — you have no idea which feature is better, so you flip both and take whatever comes up.
  • After 50 interactions, the text feature’s coin lands “heads” 65% of the time (users click on text-matched results more). You naturally start weighting text higher, but still try image sometimes.
  • After 1000 interactions, the text coin lands heads 72% of the time with very little variance. You’re confident in the weights and rarely deviate.
The mathematical version: each “coin” is a Beta(alpha, beta) distribution where alpha counts successes (clicks) and beta counts non-successes (impressions without clicks). Sampling from this distribution gives you a weight that naturally balances exploration and exploitation.

Hierarchical Fallback

Not every user has enough interaction history for personalized weights. The system uses a three-level fallback:
LevelContextMin InteractionsWhen Used
PersonalIndividual user5User has clicked/purchased enough for reliable weights
DemographicUser segment1User is new, but their segment has data
GlobalAll users1No segment data; uses aggregate behavior
PriorUniform0No interactions at all; falls back to equal weights
The user_id in your interaction signals enables personal-level learning. The segment field (e.g., “enterprise”, “consumer”, “power-user”) enables demographic-level learning.

End-to-End Walkthrough

1. Create a retriever with learned fusion

{
  "name": "product-search-learned",
  "stages": [
    {
      "stage_type": "filter",
      "stage_id": "feature_search",
      "parameters": {
        "searches": [
          {
            "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1",
            "query": "{{INPUT.query}}",
            "top_k": 100
          },
          {
            "feature_uri": "mixpeek://multimodal_extractor@v1/vertex_multimodal_embedding",
            "query": "{{INPUT.query}}",
            "top_k": 100
          }
        ],
        "fusion": "learned",
        "final_top_k": 25
      }
    }
  ]
}
curl -X POST "$MP_API_URL/v1/retrievers/{retriever_id}/execute" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -d '{
    "query": {
      "query": "wireless earbuds noise canceling"
    },
    "user_id": "user_456"
  }'
With zero interactions, this behaves like RRF (uniform weights). The response includes an execution_id you’ll use for interaction tracking.

3. Capture interactions

curl -X POST "$MP_API_URL/v1/retrievers/interactions" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -d '{
    "feature_id": "doc_product_789",
    "interaction_type": ["click", "purchase"],
    "position": 2,
    "metadata": {
      "query": "wireless earbuds noise canceling"
    },
    "user_id": "user_456",
    "session_id": "sess_abc"
  }'

4. Improved results over time

After 100+ interactions, the same search for user_456 returns results with personalized fusion weights. If this user consistently engages with text-matched results over image-matched ones, the text feature weight increases for their queries.

5. Verify convergence

Use analytics to check how weights are evolving:
curl "$MP_API_URL/v1/analytics/retrievers/{retriever_id}/signals?signal_type=learned_weights&hours=168" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE"

Configuration Reference

fusion
string
required
Set to "learned" to enable Thompson Sampling fusion.
searches[].feature_uri
string
required
Each feature URI defines an “arm” in the bandit. The system learns a separate weight for each.
user_id
string
Passed at execution time. Enables personal-level weight learning. Without this, the system uses global weights only.
The Thompson Sampler uses these internal parameters (not user-configurable):
ParameterDefaultDescription
prior_alpha1.0Beta distribution alpha prior (uniform)
prior_beta1.0Beta distribution beta prior (uniform)
exploration_bonus1.0Multiplier for distribution variance; >1 increases exploration
min_interactions5Minimum interactions before using personal context

When to Use Learned vs Static

ScenarioRecommendationWhy
New product, no interaction datarrfNo data to learn from; RRF is a strong default
Domain expert knows feature importanceweightedManual weights capture expert knowledge immediately
Diverse user base with different preferenceslearnedDifferent users may benefit from different feature weights
A/B testing fusion approachesrrflearnedStart with baseline, measure improvement with evaluations
Single search featureNone neededFusion only applies when combining multiple features