Cache Architecture
Cache keys are computed from:- Retriever ID
- Input parameters (normalized)
- Filters and sort criteria
- Stage configurations (if stage-level caching)
Cache Levels
Retriever-Level Caching
Caches entire execution results:- Identical queries are common (e.g., trending searches, popular categories)
- Expensive pipelines with multiple stages
- Results don’t change frequently
Stage-Level Caching
Caches specific stage outputs within a pipeline:- Some stages are expensive (LLM, web search) but others must be fresh (filters, sorts)
- Partial cache hits provide value
- Different queries share intermediate results
Feature Store Caching
Embeddings are cached implicitly in Qdrant vectors. Re-embedding identical text is avoided automatically.TTL Selection Guide
| Use Case | Recommended TTL | Rationale |
|---|---|---|
| Real-time dashboards | 10-30 seconds | Balance freshness vs load |
| Product search | 5-15 minutes | Inventory changes slowly |
| Customer support KB | 30-60 minutes | Documentation is stable |
| News/trending content | 1-2 minutes | Content updates frequently |
| LLM-generated summaries | 1-24 hours | Expensive to regenerate |
| Exploratory research | 5-10 minutes | Users iterate rapidly |
Cache Key Normalization
Automatic Normalization
Mixpeek normalizes cache keys to maximize hit rate:Custom Normalization
Control which inputs affect cache keys:session_id but identical query share the same cache.
Invalidation Strategies
Time-Based (TTL)
Default strategy. Cache entries expire after TTL:Manual Invalidation
Invalidate specific retriever caches:- After reindexing documents
- After collection schema updates
- After taxonomy/cluster enrichment
Event-Driven Invalidation
Use webhooks to invalidate on document updates:Optimizing Cache Hit Rate
1. Pre-Warm Cache
For predictable queries, warm cache before peak traffic:2. Query Canonicalization
Normalize queries before submission:3. Aggregate Similar Queries
Use query clustering to map variants to canonical forms:4. Cache Partial Results
For multi-stage pipelines, cache expensive stages even if final results differ:Monitoring Cache Performance
Track Hit Rate
Per-Retriever Metrics
Optimize Based on Metrics
| Metric | Observation | Action |
|---|---|---|
| Hit rate <50% | Queries too diverse | Increase TTL, add query canonicalization |
| Hit rate >90% | Over-caching | Reduce TTL to save memory |
| High miss latency | Expensive execution | Enable stage-level caching |
| Large cache size | Memory pressure | Reduce TTL or limit cached stages |
Cost-Performance Trade-Offs
Scenario 1: High-Volume, Repetitive Queries
Example: E-commerce product search with trending queries Strategy:Scenario 2: Unique, Expensive Queries
Example: Research assistant with LLM summarization Strategy:Scenario 3: Real-Time Dashboards
Example: Live analytics with frequent updates Strategy:Advanced Patterns
Conditional Caching
Cache only if query matches criteria:Cache Warming Pipeline
Pre-populate cache with analytics-driven queries:Multi-Tier Caching
Combine local (client-side) and remote (Mixpeek) caches:Cache Expiration Strategies
Passive Expiration
Default behavior. Entries expire when TTL reached:Active Refresh
Update cache before expiration (background job):Lazy Invalidation
Invalidate on write, but serve stale data during recompute:Best Practices
Start with conservative TTLs
Start with conservative TTLs
Begin with short TTLs (60-300s) and increase based on observed freshness requirements and hit rate.
Cache expensive stages selectively
Cache expensive stages selectively
Don’t cache cheap operations (filters, sorts). Focus on LLM, web search, and reranking stages.
Monitor cache memory usage
Monitor cache memory usage
Large caches impact performance. Tune TTLs and cache key fields to balance hit rate vs memory.
Invalidate proactively
Invalidate proactively
When reindexing or updating taxonomy, manually invalidate affected retriever caches.
Normalize queries aggressively
Normalize queries aggressively
Lowercase, trim, remove stop words, and canonicalize synonyms to maximize cache hits.
Use separate retrievers for different cache behaviors
Use separate retrievers for different cache behaviors
E.g.,
product-search-fresh (TTL=30s) vs product-search-stable (TTL=900s) for different UX contexts.Troubleshooting
Low Hit Rate
Symptoms: <50% hit rate despite repetitive queries Diagnosis:- Unnecessary fields in
cache_key_fields - Lack of query normalization
- Session IDs or timestamps affecting keys
Stale Results
Symptoms: Users report outdated search results Diagnosis: Check TTL vs update frequency Fix:- Reduce TTL
- Implement event-driven invalidation
- Use
allow_stale=falsefor critical queries
Cache Memory Pressure
Symptoms: High Redis memory usage, evictions Diagnosis:- Reduce TTL globally
- Limit
cache_stage_namesto expensive stages only - Implement LRU eviction policy
Next Steps
- Review Caching architecture overview
- Monitor with Analytics
- Optimize costs with Rate Limits & Quotas
- Explore Retrievers configuration options

