Mixpeek currently caches embeddings generated during retriever pipelines to avoid recomputation. There is no collection-level caching and no cache configuration or stats endpoints at this time.

Overview

  • What’s cached: Embeddings produced by the inference stage in retriever pipelines (dense and sparse).
  • Why: Reduce repeated calls to the inference engine for identical inputs, cutting latency and cost.
  • Where: Redis, with per-namespace key prefixing.
  • Invalidation: TTL-based expiration only; no event-based invalidation.

How it works

  • The inference stage builds a deterministic cache key from:
    • namespace_id
    • inference_name
    • query inputs (JSON-serialized with stable sorting)
  • On hit: the cached embedding (dense list or sparse vector) is returned.
  • On miss: Mixpeek calls the inference engine, then writes the result to Redis with a TTL.

Cache keys and TTL

  • Key format: embedding:{namespace_id}:{inference_name}:{stable_json_of_inputs}
  • TTL: Default 3600 seconds for stored embeddings.
  • Namespacing: Keys can be prefixed by namespace to isolate tenants.

Configuration surface

  • Toggle: Stages use a use_cache flag (defaults to enabled for inference embeddings).
  • TTL: Set in the inference caching call (default 3600s). There is no per-retriever or API-level cache config.
  • No APIs: There are currently no endpoints for cache configuration, statistics, or invalidation.

Invalidation

  • TTL expiry only: Entries naturally expire after the TTL.
  • Manual deletion: Possible via Redis tools, but no Mixpeek endpoints for targeted invalidation.

Best practices

  • Stable inputs: Ensure input mappings are deterministic so equivalent queries hit the same key.
  • Appropriate TTL: Use longer TTLs for stable content; shorter TTLs if inputs or models change frequently.
  • Sparse vs dense: Both are cached; sparse vectors are stored as indices/values.