Health & Status
GET /v1/health– checks MongoDB, Qdrant, Redis, Celery, Engine, and ClickHouse (if analytics enabled). ReturnsOKorDEGRADEDwith per-service errors.- Tasks API –
/v1/tasks/{task_id}and/v1/tasks/listexpose status for batches, clustering jobs, taxonomy materialization, and migrations. All tasks useTaskStatusEnum. - Webhooks – webhook events recorded in MongoDB provide a durable log of ingestion and enrichment milestones (
collection.documents.written, etc.).
Engine Monitoring
- Ray Dashboard (port 8265) – view worker health, task timelines, Serve deployments, resource utilization, and logs.
- Ray logs – pod logs (Kubernetes) or Ray CLI provide detailed extractor and clustering output (
ray logs <job_id>). - Serve metrics – per-model latency and request counts; scrape via Prometheus or Ray metrics endpoint.
Analytics APIs
Enable analytics (ENABLE_ANALYTICS=true) to populate ClickHouse-backed metrics:
| Endpoint | Insight |
|---|---|
/v1/analytics/retrievers/{id}/performance | Query volume, latency percentiles |
/v1/analytics/retrievers/{id}/stages | Stage-level timing and candidate counts |
/v1/analytics/retrievers/{id}/signals | Cache hits, rerank scores, filter reductions |
/v1/analytics/retrievers/{id}/cache-performance | Hit/miss rates and latency delta |
/v1/analytics/retrievers/{id}/slow-queries | Top slow queries with execution context |
/v1/analytics/usage/summary | Credit and resource usage (billing support) |
Logging & Tracing
- API layer – structured JSON logs include request IDs, namespace, HTTP status, error codes, and downstream latency.
- Celery workers – log task execution, retries, and webhook dispatch results.
- Ray workers – include extractor metrics, batch IDs, and queue stats; aggregate logs centrally for long-term retention.
- Correlation – propagate
x-request-idfrom API to Engine jobs viaadditional_data.request_idto stitch traces together.
Metrics to Track
| Component | Key Metrics |
|---|---|
| API | Request rate, p95 latency, error rate, rate-limit hits |
| Celery | Queue depth, task execution time, retry count |
| Ray | Worker utilization (CPU/GPU), job duration, Serve requests in flight |
| MongoDB | Operation latency, primary health, replication lag |
| Qdrant | Memory usage, search latency, vector count per namespace |
| Redis | Connection count, command latency, cache hit ratio |
Alerting Playbook
- Latency spike → check retriever analytics, stage statistics, and Ray Serve load.
- Task backlog → inspect Celery queue length, Redis health, and Ray worker availability.
- Failed enrichment → query
/v1/tasks/listforFAILED, inspecterror_message, review webhook events. - Storage saturation → monitor Qdrant RAM usage and MongoDB disk consumption; scale storage or shard by namespace.
- Cache regression → view cache hit-rate endpoint; adjust TTLs or stage cache configuration.
Dashboards to Build
- API dashboard – health endpoint status, request latency, error breakdown, rate-limit counters.
- Engine dashboard – Ray worker utilization, job runtime percentiles, extractor throughput, Serve queue depth.
- Retrieval performance – retriever analytics charts (latency, cache hits, slow queries).
- Storage dashboard – MongoDB/Redis/Qdrant metrics for capacity planning.
- Task tracker – open tasks by status, median processing times, failure rates.
Incident Response Tips
- Keep runbooks for common failures (e.g., extractor timeouts, Qdrant restarts).
- Use webhook history to confirm whether ingestion completed or stalled.
- Capture Ray job IDs from task metadata to replay logs quickly.
- Snapshot retriever and collection configurations when debugging to ensure you’re reproducing the same pipeline.

