Observe API and Engine health, track background jobs via Tasks, and integrate logs/metrics to keep the platform healthy at scale.
Health checks
- API:
/v1/health
returns status for Redis, MongoDB, Qdrant, Celery, and Engine - Reference: Health API
Logs & tracing
API logs
Structured logs for requests, exceptions, and rate limits
Engine logs
Ray task logs for extractors, clustering, and taxonomy flows
Correlation
Propagate request IDs across API → Engine calls for traceability
Errors
Centralized error models; alert on spikes and repeated failures
Metrics
- Request throughput, latency percentiles, error rates
- Rate limit hits per route and namespace
Tasks visibility
- Use Tasks to monitor batch and long‑running jobs
- List active tasks and drill into details; alert on FAILED/CANCELED
Dashboards & alerts
1
Dashboards
Build API, Engine, and datastore dashboards with your metrics backend
2
SLOs
Define target latency and error budgets; alert on burn rate
3
Runbooks
Document incident steps for extractor failures, clustering stalls, and Qdrant errors
References
- Health: /api-reference/health/healthcheck
- Tasks: /processing/tasks
- Clusters: /enrichment/clusters
- Taxonomies: /enrichment/taxonomies