Mixpeek currently caches embeddings generated during retriever pipelines to avoid recomputation. There is no collection-level caching and no cache configuration or stats endpoints at this time.
Overview
- What’s cached: Embeddings produced by the inference stage in retriever pipelines (dense and sparse).
- Why: Reduce repeated calls to the inference engine for identical inputs, cutting latency and cost.
- Where: Redis, with per-namespace key prefixing.
- Invalidation: TTL-based expiration only; no event-based invalidation.
How it works
- The inference stage builds a deterministic cache key from:
- namespace_id
- inference_name
- query inputs (JSON-serialized with stable sorting)
- On hit: the cached embedding (dense list or sparse vector) is returned.
- On miss: Mixpeek calls the inference engine, then writes the result to Redis with a TTL.
Cache keys and TTL
- Key format:
embedding:{namespace_id}:{inference_name}:{stable_json_of_inputs}
- TTL: Default 3600 seconds for stored embeddings.
- Namespacing: Keys can be prefixed by namespace to isolate tenants.
Configuration surface
- Toggle: Stages use a
use_cache
flag (defaults to enabled for inference embeddings). - TTL: Set in the inference caching call (default 3600s). There is no per-retriever or API-level cache config.
- No APIs: There are currently no endpoints for cache configuration, statistics, or invalidation.
Invalidation
- TTL expiry only: Entries naturally expire after the TTL.
- Manual deletion: Possible via Redis tools, but no Mixpeek endpoints for targeted invalidation.
Best practices
- Stable inputs: Ensure input mappings are deterministic so equivalent queries hit the same key.
- Appropriate TTL: Use longer TTLs for stable content; shorter TTLs if inputs or models change frequently.
- Sparse vs dense: Both are cached; sparse vectors are stored as indices/values.