Skip to main content
Mixpeek is split into two deployable components:
  • API Layer – FastAPI + Celery + Redis connection (HTTP endpoints, task orchestration, webhooks).
  • Engine Layer – Ray cluster + Ray Serve (extractors, inference, clustering, taxonomy runs).
Shared dependencies: MongoDB, Qdrant, Redis, and S3-compatible object storage.

Local Development

./start.sh scripts spin up a full stack with Docker Compose:
./start.sh api      # FastAPI + Celery
./start.sh celery   # Celery Beat
./start.sh engine   # Ray head + workers
Docker Compose services:
  • mongodb – metadata (mongodb://localhost:27017)
  • qdrant – vector storage (http://localhost:6333)
  • redis – task queue/cache (redis://localhost:6379)
  • localstack – S3 emulator (http://localhost:4566)
Run curl http://localhost:8000/v1/health to confirm readiness, then follow the Quickstart.

Production Topology (Kubernetes)

Namespace: mixpeek-api
  ├─ fastapi-deployment (ReplicaSet + HPA)
  ├─ celery-worker-deployment (process tasks)
  └─ celery-beat-deployment (1 replica scheduler)

Namespace: mixpeek-engine
  ├─ ray-head (StatefulSet)
  ├─ ray-worker-cpu (Autoscaled Deployment)
  └─ ray-worker-gpu (Autoscaled Deployment)

Namespace: mixpeek-data
  ├─ mongodb (StatefulSet, replica set)
  ├─ qdrant (StatefulSet or distributed cluster)
  └─ redis (Deployment or Redis Cluster)
Recommended node pools:
  • API nodes – general purpose (e.g., t3.xlarge), scale FastAPI/Celery horizontally.
  • CPU workers – compute-optimized (e.g., c5.4xlarge) for text extraction, clustering.
  • GPU workers – GPU instances (e.g., p3.2xlarge) for embeddings, rerankers, video processing.
Expose the API via an ingress or load balancer; keep Ray Serve internal unless exposing custom inference endpoints.

Managed Ray (Anyscale / Ray Service)

  • Deploy the Engine layer via a managed Ray service.
  • Point the API layer to the Ray cluster using ENGINE_API_URL and Ray job submission credentials.
  • Managed Ray handles autoscaling, node health, and GPU provisioning; you manage API + data stores.

Core Environment Variables

ServiceKeyDescription
APIMONGO_URI, QDRANT_URL, REDIS_URL, S3_BUCKET, ENGINE_API_URLConnectivity
EngineMONGO_URI, QDRANT_URL, S3_BUCKET, RAY_memory, RAY_num_gpusRuntime config
SharedENABLE_ANALYTICS, OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.Optional providers
Secrets should be injected via Kubernetes secrets, environment managers, or cloud secret stores.

Health & Verification

  • Endpoint: GET /v1/health – checks Redis, MongoDB, Qdrant, Celery, Engine, ClickHouse (if enabled).
  • Smoke test: create namespace → bucket → collection → upload object → submit batch → execute retriever.
  • Tasks: ensure Celery workers process webhook events, cache invalidations, and maintenance tasks.

Scaling Guidelines

ComponentScaling StrategyNotes
FastAPIHorizontal autoscale on CPU/utilizationStateless, use HPA
Celery workersScale with queue depthPrefork pool supports task termination
Ray workersAutoscale CPU/GPU poolsUse Ray autoscaler or managed Ray policies
QdrantScale vertically (RAM) or shard by namespaceMonitor vector count and query latency
MongoDBUse managed cluster (Atlas, DocumentDB)Ensure indexes on namespace_id, internal_id
RedisScale vertically or clusterUsed for task queue + caching
Monitor Ray dashboard (port 8265) for job status, resource utilization, and Serve deployments.

Deployment Checklist

  1. Provision MongoDB, Qdrant, Redis, and S3/GCS buckets (with IAM roles).
  2. Deploy Ray cluster (head + workers) and confirm job submission works.
  3. Deploy FastAPI + Celery services; configure environment variables to point to Ray + data stores.
  4. Configure ingress/HTTPS, secrets, and network policies.
  5. Run health checks and quickstart workflow to verify end-to-end functionality.
  6. Set up observability (logs, metrics, webhooks) and configure backups for MongoDB/Qdrant.

References