429 errors and unexpected costs.
API Rate Limits
- Per-route counters backed by Redis (SlowAPI).
- Tier defaults (requests/minute):
- Free: 10 (general), 10 (retriever execute, uploads)
- Pro: 60
- Enterprise: configurable (contact support)
- Responses include
details.api_nameto identify which bucket tripped the limit.
- Implement exponential backoff with jitter (
1s,2s,4s, …). - Batch uploads using
/objects/batchand process via batches instead of hot loops. - Enable retriever caching to reduce duplicate execute calls.
Credits & Usage
Credits measure compute-intensive activity (extractions, retriever execution, LLM stages). Track usage with analytics endpoints:budget section in execution responses:
budget_limits on retrievers to cap runtime and credit spend.
Task Backlog Limits
- Celery queues and Redis memory impose soft limits on concurrent tasks.
- Monitor queue depth; auto-scale workers or throttle submissions if tasks pile up.
- Use batch sizes that match available Ray worker capacity.
Ray Serve & Model Quotas
- Each model deployment has
min_replicas/max_replicasand per-request concurrency targets. - For LLM providers (OpenAI, Anthropic, etc.), respect provider-side rate limits; Mixpeek surfaces provider errors directly.
Storage Considerations
| Component | Limit | Mitigation |
|---|---|---|
| S3 | Object count/size | Enable lifecycle policies, compress artifacts |
| MongoDB | Document size (16 MB) | Keep metadata lean; store large payloads in S3 |
| Qdrant | Memory | Shard by namespace, prune stale documents, upgrade memory |
Handling 429 Too Many Requests
- Inspect
error.details.api_name. - Back off and retry later (respect Retry-After headers if supplied).
- Reduce concurrency from your client or cache repeated reads.
- Consider upgrading your tier if usage is sustained.
Monitoring Usage
- Log API responses for
429,403, andNotEnabledErrorto detect limit issues. - Build dashboards using analytics endpoints (retriever performance, cache hit rate, slow queries) to see when workloads approach limits.
- Use webhook events to trigger scaling actions (e.g., new batch submitted → scale Ray workers).
When to Contact Support
- You need higher rate limits or custom credit pools.
- You plan to run batch workloads that exceed current quotas.
- Provider-specific limits (LLM APIs) block your workflow even after retries.

