text: 0.7, image: 0.3), the system learns from interaction data which features produce results users engage with.
How It Works
Learned fusion uses Thompson Sampling, a well-studied algorithm for the multi-armed bandit problem. Here’s how it applies to search fusion:Initialize with uniform priors
Each search feature (e.g., text embeddings, image embeddings) starts with a Beta(1, 1) distribution — a flat line that assigns equal probability to all weight values. This means zero assumptions about which feature is better.
Sample weights at query time
When a query arrives, the system draws a random weight from each feature’s Beta distribution and normalizes them to sum to 1. Early on, samples are highly variable (exploration). As data accumulates, they stabilize (exploitation).
Execute search with sampled weights
The feature search stage runs each embedding search and fuses results using the sampled weights — functionally identical to weighted fusion, but with dynamically chosen weights.
Capture user interactions
Users interact with results: clicks, purchases, skips. Each interaction is recorded with the document ID, position, and the context key that identifies which weight sample was used.
Update Beta distributions
Positive interactions (clicks, purchases) increment the
alpha parameter: alpha = 1 + clicks. Non-engagement increments beta: beta = 1 + (impressions - clicks). This shifts the distribution toward weights that produce engaging results.Thompson Sampling Explained
Think of it like flipping weighted coins. Each feature has its own coin:- At the start, both coins are fair — you have no idea which feature is better, so you flip both and take whatever comes up.
- After 50 interactions, the text feature’s coin lands “heads” 65% of the time (users click on text-matched results more). You naturally start weighting text higher, but still try image sometimes.
- After 1000 interactions, the text coin lands heads 72% of the time with very little variance. You’re confident in the weights and rarely deviate.
Hierarchical Fallback
Not every user has enough interaction history for personalized weights. The system uses a three-level fallback:| Level | Context | Min Interactions | When Used |
|---|---|---|---|
| Personal | Individual user | 5 | User has clicked/purchased enough for reliable weights |
| Demographic | User segment | 1 | User is new, but their segment has data |
| Global | All users | 1 | No segment data; uses aggregate behavior |
| Prior | Uniform | 0 | No interactions at all; falls back to equal weights |
user_id in your interaction signals enables personal-level learning. The segment field (e.g., “enterprise”, “consumer”, “power-user”) enables demographic-level learning.
End-to-End Walkthrough
1. Create a retriever with learned fusion
2. Execute a search
execution_id you’ll use for interaction tracking.
3. Capture interactions
4. Improved results over time
After 100+ interactions, the same search foruser_456 returns results with personalized fusion weights. If this user consistently engages with text-matched results over image-matched ones, the text feature weight increases for their queries.
5. Verify convergence
Use analytics to check how weights are evolving:Configuration Reference
Set to
"learned" to enable Thompson Sampling fusion.Each feature URI defines an “arm” in the bandit. The system learns a separate weight for each.
Passed at execution time. Enables personal-level weight learning. Without this, the system uses global weights only.
| Parameter | Default | Description |
|---|---|---|
prior_alpha | 1.0 | Beta distribution alpha prior (uniform) |
prior_beta | 1.0 | Beta distribution beta prior (uniform) |
exploration_bonus | 1.0 | Multiplier for distribution variance; >1 increases exploration |
min_interactions | 5 | Minimum interactions before using personal context |
When to Use Learned vs Static
| Scenario | Recommendation | Why |
|---|---|---|
| New product, no interaction data | rrf | No data to learn from; RRF is a strong default |
| Domain expert knows feature importance | weighted | Manual weights capture expert knowledge immediately |
| Diverse user base with different preferences | learned | Different users may benefit from different feature weights |
| A/B testing fusion approaches | rrf → learned | Start with baseline, measure improvement with evaluations |
| Single search feature | None needed | Fusion only applies when combining multiple features |
Related
- Fusion Strategies — comparison of all 5 strategies
- Interaction Signals — capturing the data that powers learning
- Evaluations — measuring learned fusion quality
- Feature Search stage — where fusion is configured

