Skip to main content
Mixpeek enforces limits at multiple layers—API rate limits, credit consumption, retriever budgets, and infrastructure quotas. Understanding each layer helps you avoid 429 errors and unexpected costs.

API Rate Limits

  • Per-route counters backed by Redis (SlowAPI).
  • Tier defaults (requests/minute):
    • Free: 10 (general), 10 (retriever execute, uploads)
    • Pro: 60
    • Enterprise: configurable (contact support)
  • Responses include details.api_name to identify which bucket tripped the limit.
Recommendations
  1. Implement exponential backoff with jitter (1s, 2s, 4s, …).
  2. Batch uploads using /objects/batch and process via batches instead of hot loops.
  3. Enable retriever caching to reduce duplicate execute calls.

Credits & Usage

Credits measure compute-intensive activity (extractions, retriever execution, LLM stages). Track usage with analytics endpoints:
curl -sS -X GET "$MP_API_URL/v1/analytics/usage/summary?start_date=2025-10-01&end_date=2025-10-31" \
  -H "Authorization: Bearer $MP_API_KEY" \
  -H "X-Namespace: $MP_NAMESPACE"
Monitor retriever cost with the budget section in execution responses:
"budget": {
  "credits_used": 18.75,
  "credits_limit": 100,
  "time_elapsed_ms": 5250
}
Set budget_limits on retrievers to cap runtime and credit spend.

Task Backlog Limits

  • Celery queues and Redis memory impose soft limits on concurrent tasks.
  • Monitor queue depth; auto-scale workers or throttle submissions if tasks pile up.
  • Use batch sizes that match available Ray worker capacity.

Ray Serve & Model Quotas

  • Each model deployment has min_replicas/max_replicas and per-request concurrency targets.
  • For LLM providers (OpenAI, Anthropic, etc.), respect provider-side rate limits; Mixpeek surfaces provider errors directly.

Storage Considerations

ComponentLimitMitigation
S3Object count/sizeEnable lifecycle policies, compress artifacts
MongoDBDocument size (16 MB)Keep metadata lean; store large payloads in S3
QdrantMemoryShard by namespace, prune stale documents, upgrade memory

Handling 429 Too Many Requests

  1. Inspect error.details.api_name.
  2. Back off and retry later (respect Retry-After headers if supplied).
  3. Reduce concurrency from your client or cache repeated reads.
  4. Consider upgrading your tier if usage is sustained.

Monitoring Usage

  • Log API responses for 429, 403, and NotEnabledError to detect limit issues.
  • Build dashboards using analytics endpoints (retriever performance, cache hit rate, slow queries) to see when workloads approach limits.
  • Use webhook events to trigger scaling actions (e.g., new batch submitted → scale Ray workers).

When to Contact Support

  • You need higher rate limits or custom credit pools.
  • You plan to run batch workloads that exceed current quotas.
  • Provider-specific limits (LLM APIs) block your workflow even after retries.
Include namespace, organization, relevant request IDs, and desired limits when reaching out.

References