- API Layer – FastAPI + Celery + Redis connection (HTTP endpoints, task orchestration, webhooks).
- Engine Layer – Ray cluster + Ray Serve (extractors, inference, clustering, taxonomy runs).
Local Development
./start.sh scripts spin up a full stack with Docker Compose:
mongodb– metadata (mongodb://localhost:27017)qdrant– vector storage (http://localhost:6333)redis– task queue/cache (redis://localhost:6379)localstack– S3 emulator (http://localhost:4566)
curl http://localhost:8000/v1/health to confirm readiness, then follow the Quickstart.
Production Topology (Kubernetes)
- API nodes – general purpose (e.g.,
t3.xlarge), scale FastAPI/Celery horizontally. - CPU workers – compute-optimized (e.g.,
c5.4xlarge) for text extraction, clustering. - GPU workers – GPU instances (e.g.,
p3.2xlarge) for embeddings, rerankers, video processing.
Managed Ray (Anyscale / Ray Service)
- Deploy the Engine layer via a managed Ray service.
- Point the API layer to the Ray cluster using
ENGINE_API_URLand Ray job submission credentials. - Managed Ray handles autoscaling, node health, and GPU provisioning; you manage API + data stores.
Core Environment Variables
| Service | Key | Description |
|---|---|---|
| API | MONGO_URI, QDRANT_URL, REDIS_URL, S3_BUCKET, ENGINE_API_URL | Connectivity |
| Engine | MONGO_URI, QDRANT_URL, S3_BUCKET, RAY_memory, RAY_num_gpus | Runtime config |
| Shared | ENABLE_ANALYTICS, OPENAI_API_KEY, ANTHROPIC_API_KEY, etc. | Optional providers |
Health & Verification
- Endpoint:
GET /v1/health– checks Redis, MongoDB, Qdrant, Celery, Engine, ClickHouse (if enabled). - Smoke test: create namespace → bucket → collection → upload object → submit batch → execute retriever.
- Tasks: ensure Celery workers process webhook events, cache invalidations, and maintenance tasks.
Scaling Guidelines
| Component | Scaling Strategy | Notes |
|---|---|---|
| FastAPI | Horizontal autoscale on CPU/utilization | Stateless, use HPA |
| Celery workers | Scale with queue depth | Prefork pool supports task termination |
| Ray workers | Autoscale CPU/GPU pools | Use Ray autoscaler or managed Ray policies |
| Qdrant | Scale vertically (RAM) or shard by namespace | Monitor vector count and query latency |
| MongoDB | Use managed cluster (Atlas, DocumentDB) | Ensure indexes on namespace_id, internal_id |
| Redis | Scale vertically or cluster | Used for task queue + caching |
Deployment Checklist
- Provision MongoDB, Qdrant, Redis, and S3/GCS buckets (with IAM roles).
- Deploy Ray cluster (head + workers) and confirm job submission works.
- Deploy FastAPI + Celery services; configure environment variables to point to Ray + data stores.
- Configure ingress/HTTPS, secrets, and network policies.
- Run health checks and quickstart workflow to verify end-to-end functionality.
- Set up observability (logs, metrics, webhooks) and configure backups for MongoDB/Qdrant.
References
- Architecture – full system design
- Observability – metrics, logs, dashboards
- Security – tenancy, auth, secret management
- Webhooks – event processing pipeline

