Skip to main content
Mixpeek layers several caches to deliver low-latency responses while guaranteeing consistency. Every layer relies on deterministic signatures so you never serve results from an outdated index.

Cache Layers

LayerScopeBacking StoreTTLPurpose
Query responseFull retriever outputRedisConfigurable (cache_config.ttl_seconds)Return entire execution payload instantly
Stage outputIndividual stagesRedisInherits retriever TTLReuse expensive stages (e.g., knn_search, rerank) across similar queries
InferenceEmbeddings & rerankersRedis~1 hourAvoid recomputing identical model inferences
Document featuresStored vectors/payloadsQdrantPermanentReuse ingestion-time features for future queries

Index Signatures

Each collection stores an index_signature in MongoDB. The signature hashes:
  • Collection configuration (feature extractor, passthrough fields)
  • Document count and vector dimensions
  • Timestamp of last ingestion event (with debounce logic)
Retriever cache keys include index_signature, so whenever ingestion updates the collection the signature changes and cached query responses automatically miss.
cache:retriever:quickstart-search:
  hash(
    inputs,
    filters,
    pagination,
    collection_signature="xyz789"
  )

HTTP-Friendly Responses

Retriever execution returns cache hints in response headers:
  • ETag — hash of inputs + signature. Reuse it with If-None-Match to get 304 Not Modified.
  • Cache-Control — populated when cache_config.enabled is true.
  • X-CacheHIT or MISS to aid debugging.
Example:
curl -i -X POST "$MP_API_URL/v1/retrievers/<id>/execute" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H 'Content-Type: application/json' \
  -d '{ "inputs": { "query_text": "smart speaker" }, "limit": 5 }'
Second execution with If-None-Match:
curl -i -X POST "$MP_API_URL/v1/retrievers/<id>/execute" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE" \
  -H 'If-None-Match: "abc123"' \
  -H 'Content-Type: application/json' \
  -d '{ "inputs": { "query_text": "smart speaker" }, "limit": 5 }'
If nothing changed you’ll receive 304 Not Modified with empty body and the same caching headers.

Stage-Level Controls

Enable stage caching by naming stages in cache_stage_names:
{
  "cache_config": {
    "enabled": true,
    "ttl_seconds": 600,
    "cache_stage_names": ["knn_search", "rerank"]
  }
}
  • Stage cache keys hash stage parameters, inputs, upstream state, and collection signature.
  • Perfect for hybrid pipelines where search stays constant but downstream filters change.

Inference Cache

The Engine caches model calls using a hashed payload of (model_name, inputs, parameters). Use it to:
  • Reuse embeddings for identical prompts or documents
  • Skip recomputing reranking scores for popular queries
  • Short-circuit repeated LLM-based filters with static criteria

Monitoring Cache Performance

  • Use GET /v1/analytics/retrievers/{id}/cache-performance for hit/miss ratios and latency deltas.
  • stage_statistics inside retriever responses flag cache_hit per stage.
  • Redis namespaces per feature (e.g., cache:retriever:...) make it easy to inspect keys if needed.

Best Practices

  • Enable caching for production retrievers with clear TTL requirements.
  • Avoid caching when results depend on rapidly changing external data.
  • Use stage caching when reranking or filtering is the bottleneck.
  • Leverage inference caching for expensive LLM or GPU workloads—even small hit rates pay off.
Caching is optional, but when enabled it integrates cleanly with Mixpeek’s index signatures so you never have to invalidate caches manually.