1
Create batch
Supply object IDs (or create an empty batch and add objects later).
2
Submit batch
API flattens manifests into per-extractor Parquet artifacts and writes them to S3.
3
Engine processes
Ray pollers pick up the batch, execute extractors tier-by-tier, and write documents to Qdrant.
4
Webhook & cache updates
Engine emits webhook events, Celery Beat invalidates caches, and collection schemas update.
Create a Batch
Submit for Processing
include_processing_history=truerecords each enrichment operation ininternal_metadata.processing_history.- Response contains a
task_id; poll/v1/tasks/{task_id}or the batch resource directly.
Lifecycle & Status
| Status | Meaning |
|---|---|
DRAFT | Created but not submitted |
QUEUED | Submitted; waiting for poller pickup |
PROCESSING | Ray job running feature extractors |
COMPLETED | All extractors finished successfully |
FAILED | Extractors or Ray job failed (see error_message) |
Under the Hood
- API writes manifest metadata to MongoDB and extractor row artifacts to S3.
- Ray poller queries MongoDB every 5 seconds for
PENDINGbatches. - Poller submits a Ray job with manifest details.
- Worker downloads artifacts, runs extractors in dependency tiers, and writes documents to Qdrant/MongoDB.
- QdrantBatchProcessor emits webhook events and updates collection index signatures.
Monitoring
GET /v1/buckets/<bucket_id>/batches/<batch_id>– check batch status and manifest metadata.GET /v1/tasks/<task_id>– track task-level progress (Redis TTL ≈ 24h).- Webhook events (
collection.documents.written) notify you when documents land. - Analytics (coming soon) provide throughput metrics for Extractor + Engine performance.
Scaling Tips
- Chunk large imports into batches of 1k–10k objects to keep pollers responsive.
- Parallelize submissions—pollers handle multiple batches concurrently.
- Use namespaces to isolate environments; pollers are namespace-aware.
- Retry safely—batch submission and task polling are idempotent.
- Pipeline scheduling—combine Celery Beat or your orchestrator to submit batches on cron.
Related APIs
- Create Batch
- Add Objects to Batch
- Submit Batch
- List Batches
- Delete Batch
- Tasks for status and error handling

