Limits & Quotas

Mixpeek enforces limits at multiple layers—API rate limits, credit consumption, retriever budgets, and infrastructure quotas. Understanding each layer helps you avoid 429 errors and unexpected costs.

API Rate Limits

Per-route counters backed by Redis (SlowAPI).
Tier defaults (requests/minute):
- Free: 10 (general), 10 (retriever execute, uploads)
- Pro: 60
- Enterprise: configurable (contact support)
Responses include details.api_name to identify which bucket tripped the limit.

Recommendations

Implement exponential backoff with jitter (1s, 2s, 4s, …).
Batch uploads using /objects/batch and process via batches instead of hot loops.
Enable retriever caching to reduce duplicate execute calls.

Credits & Usage

Credits measure compute-intensive activity (extractions, retriever execution, LLM stages). Track usage with analytics endpoints:

curl -sS -X GET "$MP_API_URL/v1/analytics/usage/summary?start_date=2025-10-01&end_date=2025-10-31" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE"

Monitor retriever cost with the budget section in execution responses:

"budget": {
  "credits_used": 18.75,
  "credits_limit": 100,
  "time_elapsed_ms": 5250
}

Set budget_limits on retrievers to cap runtime and credit spend.

Task Backlog Limits

Celery queues and Redis memory impose soft limits on concurrent tasks.
Monitor queue depth; auto-scale workers or throttle submissions if tasks pile up.
Use batch sizes that match available Ray worker capacity.

Ray Serve & Model Quotas

Each model deployment has min_replicas/max_replicas and per-request concurrency targets.
For LLM providers (OpenAI, Anthropic, etc.), respect provider-side rate limits; Mixpeek surfaces provider errors directly.

Storage Considerations

Component	Limit	Mitigation
S3	Object count/size	Enable lifecycle policies, compress artifacts
MongoDB	Document size (16 MB)	Keep metadata lean; store large payloads in S3
Qdrant	Memory	Shard by namespace, prune stale documents, upgrade memory

Handling 429 Too Many Requests

Inspect error.details.api_name.
Back off and retry later (respect Retry-After headers if supplied).
Reduce concurrency from your client or cache repeated reads.
Consider upgrading your tier if usage is sustained.

Monitoring Usage

Log API responses for 429, 403, and NotEnabledError to detect limit issues.
Build dashboards using analytics endpoints (retriever performance, cache hit rate, slow queries) to see when workloads approach limits.
Use webhook events to trigger scaling actions (e.g., new batch submitted → scale Ray workers).

When to Contact Support

You need higher rate limits or custom credit pools.
You plan to run batch workloads that exceed current quotas.
Provider-specific limits (LLM APIs) block your workflow even after retries.

Include namespace, organization, relevant request IDs, and desired limits when reaching out.

Getting Started

Ingest Data

Process Data

Search & Retrieve

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

Limits & Quotas

API Rate Limits

Credits & Usage

Task Backlog Limits

Ray Serve & Model Quotas

Storage Considerations

Handling 429 Too Many Requests

Monitoring Usage

When to Contact Support

References

Getting Started

Ingest Data

Process Data

Search & Retrieve

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

​API Rate Limits

​Credits & Usage

​Task Backlog Limits

​Ray Serve & Model Quotas

​Storage Considerations

​Handling 429 Too Many Requests

​Monitoring Usage

​When to Contact Support

​References

API Rate Limits

Credits & Usage

Task Backlog Limits

Ray Serve & Model Quotas

Storage Considerations

Handling 429 Too Many Requests

Monitoring Usage

When to Contact Support

References