Caching

The Mixpeek caching system provides efficient caching of retriever results and collection data to improve performance and reduce computational overhead.

Overview

Mixpeek’s caching system is designed to optimize performance by reducing computational overhead and improving response times. The system caches both retriever results and collection data, providing significant benefits:

Reducing Computational Overhead: Caching expensive retrieval operations and feature computations
Improving Response Times: Serving frequently accessed results from cache instead of recomputing
Optimizing Resource Usage: Minimizing redundant processing of the same queries
Enabling Scalability: Supporting high-throughput retrieval operations efficiently

The caching system operates at two levels:

Retriever-level Caching: Caches the results of specific retriever operations, particularly useful for:
- Frequently repeated queries
- Expensive vector similarity searches
- Complex multimodal retrieval pipelines
- High-traffic search endpoints
Collection-level Caching: Caches collection-specific data and metadata, beneficial for:
- Frequently accessed documents
- Stable reference data
- Metadata that doesn’t change often
- Feature store lookups

Cache Configuration

Retriever Cache Configuration

Retrievers can be configured with specific caching parameters:

{
  "enabled": true,
  "ttl_override_seconds": 3600,
  "score_threshold": 0.8,
  "invalidation_events": [
    {
      "collection": "my_collection",
      "action": "update",
      "recompute_strategy": "BACKGROUND"
    }
  ],
  "max_entries_per_key": 5
}

Collection Cache Configuration

Collections can also be configured with caching parameters:

{
  "enabled": true,
  "ttl_seconds": 604800,
  "invalidation_strategy": "REPLACE_IF_BETTER",
  "max_entries_per_key": 5
}

Cache Invalidation Strategies

The system supports three invalidation strategies:

REPLACE_IF_BETTER: Replace cache entry only if new score is better
REPLACE_ALWAYS: Always replace cache entry on invalidation
KEEP_EXISTING: Keep existing cache entry regardless of new data

Recompute Strategies

When cache entries are invalidated, you can choose how to handle recomputation:

IMMEDIATE: Recompute the result immediately
BACKGROUND: Recompute in the background
LAZY: Recompute only when the result is next accessed

Cache Entry Structure

Each cache entry contains:

key: Unique identifier for the cache entry
value: The cached result
score: Score of the cached result
created_at: Timestamp of creation
expires_at: Optional expiration timestamp

Usage with Collections

Collections can be configured with caching to improve performance:

Collection-level caching is controlled by CollectionCacheConfig
Cache entries are automatically invalidated when collection data changes
Collections can specify their own TTL and invalidation strategies

Usage with Retrievers

Retrievers can leverage caching to improve performance:

Retriever-level caching is controlled by RetrieverCacheConfig
Cache keys are generated based on retriever ID and source document ID
Results are cached only if they meet the score threshold
Cache invalidation can be triggered by specific events

Best Practices

Configure Appropriate TTLs: Set TTLs based on data volatility
Use Score Thresholds: Only cache high-quality results
Monitor Cache Statistics: Track hit ratios and adjust configurations
Implement Proper Invalidation: Configure invalidation events for data changes
Balance Cache Size: Use max_entries_per_key to prevent cache bloat

Example Usage

Configure Retriever Caching

curl -X POST https://api.mixpeek.com/v1/retrievers/{retriever_id}/cache \
  -H "Authorization: Bearer {api_key}" \
  -H "Content-Type: application/json" \
  -d '{
    "enabled": true,
    "score_threshold": 0.8,
    "ttl_override_seconds": 3600,
    "invalidation_events": [
      {
        "collection": "my_collection",
        "action": "update",
        "recompute_strategy": "BACKGROUND"
      }
    ]
  }'

Configure Collection Caching

curl -X POST https://api.mixpeek.com/v1/collections/{collection_id}/cache \
  -H "Authorization: Bearer {api_key}" \
  -H "Content-Type: application/json" \
  -d '{
    "enabled": true,
    "ttl_seconds": 86400,
    "invalidation_strategy": "REPLACE_IF_BETTER"
  }'

Get Cache Statistics

curl -X GET https://api.mixpeek.com/v1/cache/stats \
  -H "Authorization: Bearer {api_key}"

Invalidate Cache

curl -X POST https://api.mixpeek.com/v1/cache/invalidate \
  -H "Authorization: Bearer {api_key}" \
  -H "Content-Type: application/json" \
  -d '{
    "retriever_id": "my_retriever",
    "collection_id": "my_collection",
    "action": "update"
  }'

Cache Key Generation

Cache keys are automatically generated by the system based on:

Retriever ID
Query parameters
Source document IDs
Collection IDs
Timestamp (for time-sensitive queries)

The system uses a consistent hashing algorithm to ensure:

Unique keys for different queries
Consistent keys for identical queries
Proper cache invalidation when data changes

You don’t need to configure cache key generation - it’s handled automatically by the system to ensure consistency and reliability.

Overview

Data Management

Data Processing

Search & Retrieval

Data Enrichment

Troubleshooting

Overview

Cache Configuration

Retriever Cache Configuration

Collection Cache Configuration

Cache Invalidation Strategies

Recompute Strategies

Cache Entry Structure

Usage with Collections

Usage with Retrievers

Best Practices

Example Usage

Configure Retriever Caching

Configure Collection Caching

Get Cache Statistics

Invalidate Cache

Cache Key Generation

Next Steps

Retrievers

Collections

Overview

Data Management

Data Processing

Search & Retrieval

Data Enrichment

Troubleshooting

​Overview

​Cache Configuration

​Retriever Cache Configuration

​Collection Cache Configuration

​Cache Invalidation Strategies

​Recompute Strategies

​Cache Entry Structure

​Usage with Collections

​Usage with Retrievers

​Best Practices

​Example Usage

​Configure Retriever Caching

​Configure Collection Caching

​Get Cache Statistics

​Invalidate Cache

​Cache Key Generation

​Next Steps

Retrievers

Collections

Overview

Cache Configuration

Retriever Cache Configuration

Collection Cache Configuration

Cache Invalidation Strategies

Recompute Strategies

Cache Entry Structure

Usage with Collections

Usage with Retrievers

Best Practices

Example Usage

Configure Retriever Caching

Configure Collection Caching

Get Cache Statistics

Invalidate Cache

Cache Key Generation

Next Steps