Batches group previously registered object IDs and submit them for asynchronous processing. Use batching for larger or scheduled ingestions. Batches operate on object IDs; they do not upload files.
Overview
- Purpose: Organize many objects into a single processing job.
- Scope: Batches are created per bucket.
- Flow: Create batch → add objects → submit for processing → monitor tasks.
When to use batching
- Large backfills: Process thousands or millions of objects in controlled chunks
- Scheduled ingestion: Nightly/weekly jobs without manual triggers
- Load smoothing: Avoid spikes from many single-object submissions
- Consistent snapshot: Group a known set of objects for reproducible results
Typical flow
- Create a batch with object IDs (or create empty and add later)
- Add objects to the batch (optional step if not provided at creation)
- Submit the batch for processing
- Track progress via Tasks
Create a batch
- API: Create Batch
- Method: POST
- Path:
/v1/buckets/{bucket_identifier}/batches
- Reference: API Reference
Add objects to a batch
- API: Add Objects to Batch
- Method: POST
- Path:
/v1/buckets/{bucket_identifier}/batches/{batch_id}/objects
- Reference: API Reference
Submit batch for processing
- API: Submit Batch for Processing
- Method: POST
- Path:
/v1/buckets/{bucket_identifier}/batches/{batch_id}/submit
- Reference: API Reference
What happens after submit
- Engine runs the configured feature extractors for downstream collections
- Documents are written into target collections with lineage and features
- Track status via Tasks
Example scenario
You import 50k new product assets each week. Create a batch with the new object IDs on Friday, submit it, and monitor the task. By Monday, enriched documents and vectors are available in yourproducts_v1
collection for retrieval.
Monitor and manage
- Track jobs: Tasks
- List batches: List Batches
- Delete batch: Delete Batch
Behavior & validation
- Bucket-scoped: A batch belongs to a bucket; objects must come from that bucket.
- Status lifecycle: Batches are created as draft, populated with objects, then submitted for processing.
- Requirements: Submit only after adding at least one object.
- Idempotency: Adding the same object twice is ignored.