Cache Layers
| Layer | Scope | Backing Store | TTL | Purpose |
|---|---|---|---|---|
| Query response | Full retriever output | Redis | Configurable (cache_config.ttl_seconds) | Return entire execution payload instantly |
| Stage output | Individual stages | Redis | Inherits retriever TTL | Reuse expensive stages (e.g., knn_search, rerank) across similar queries |
| Inference | Embeddings & rerankers | Redis | ~1 hour | Avoid recomputing identical model inferences |
| Document features | Stored vectors/payloads | Qdrant | Permanent | Reuse ingestion-time features for future queries |
Index Signatures
Each collection stores anindex_signature in MongoDB. The signature hashes:
- Collection configuration (feature extractor, passthrough fields)
- Document count and vector dimensions
- Timestamp of last ingestion event (with debounce logic)
index_signature, so whenever ingestion updates the collection the signature changes and cached query responses automatically miss.
HTTP-Friendly Responses
Retriever execution returns cache hints in response headers:ETag— hash of inputs + signature. Reuse it withIf-None-Matchto get304 Not Modified.Cache-Control— populated whencache_config.enabledis true.X-Cache—HITorMISSto aid debugging.
If-None-Match:
304 Not Modified with empty body and the same caching headers.
Stage-Level Controls
Enable stage caching by naming stages incache_stage_names:
- Stage cache keys hash stage parameters, inputs, upstream state, and collection signature.
- Perfect for hybrid pipelines where search stays constant but downstream filters change.
Inference Cache
The Engine caches model calls using a hashed payload of(model_name, inputs, parameters). Use it to:
- Reuse embeddings for identical prompts or documents
- Skip recomputing reranking scores for popular queries
- Short-circuit repeated LLM-based filters with static criteria
Monitoring Cache Performance
- Use
GET /v1/analytics/retrievers/{id}/cache-performancefor hit/miss ratios and latency deltas. stage_statisticsinside retriever responses flagcache_hitper stage.- Redis namespaces per feature (e.g.,
cache:retriever:...) make it easy to inspect keys if needed.
Best Practices
- Enable caching for production retrievers with clear TTL requirements.
- Avoid caching when results depend on rapidly changing external data.
- Use stage caching when reranking or filtering is the bottleneck.
- Leverage inference caching for expensive LLM or GPU workloads—even small hit rates pay off.

