Add Objects to Batch

Authorizations

Authorization

string

header

required

Bearer token authentication using your API key. Format: 'Bearer your_api_key'. To get an API key, create an account at mixpeek.com/start and generate a key in your account settings.

Headers

Authorization

string

required

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer sk_live_abc123def456"

"Bearer sk_test_xyz789"

X-Namespace

string

required

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Path Parameters

bucket_identifier

string

required

The unique identifier of the bucket.

batch_id

string

required

The unique identifier of the batch.

Body

application/json

The request model for adding objects to an existing batch.

object_ids

string[]

required

A list of object IDs to add to the batch.

Minimum length: 1

Examples:

["object_789", "object_101"]

Response

Successful Response

Model representing a batch of objects for processing through collections.

A batch groups bucket objects together for processing through one or more collections. Batches support multi-tier processing where collections are processed in dependency order (e.g., bucket → chunks → frames → scenes). Each tier has independent task tracking.

Use Cases: - Process multiple objects through collections in a single batch - Track progress of multi-tier decomposition pipelines - Monitor and retry individual processing tiers - Query batch status and tier-specific task information

Lifecycle: 1. Created in DRAFT status with object_ids 2. Submitted for processing → status changes to PENDING 3. Each tier processes sequentially (tier 0 → tier 1 → ... → tier N) 4. Batch completes when all tiers finish (status=COMPLETED) or any tier fails (status=FAILED)

Multi-Tier Processing: - Tier 0: Bucket objects → Collections (bucket as source) - Tier N (N > 0): Collection documents → Collections (upstream collection as source) - Each tier gets independent task tracking via tier_tasks array - Processing proceeds tier-by-tier with automatic chaining

Requirements: - batch_id: OPTIONAL (auto-generated if not provided) - bucket_id: REQUIRED - status: OPTIONAL (defaults to DRAFT) - object_ids: REQUIRED for processing (must have at least 1 object) - collection_ids: OPTIONAL (discovered via DAG resolution) - tier_tasks: OPTIONAL (populated during processing) - current_tier: OPTIONAL (set during processing) - total_tiers: OPTIONAL (defaults to 1, set during DAG resolution) - dag_tiers: OPTIONAL (populated during DAG resolution)

bucket_id

string

required

REQUIRED. Unique identifier of the bucket containing the objects to process. Must be a valid bucket ID that exists in the system. All object_ids must belong to this bucket. Format: Bucket ID as defined when bucket was created.

Examples:

"bkt_videos"

"bkt_documents_q4"

batch_id

string

OPTIONAL (auto-generated if not provided). Unique identifier for this batch. Format: 'btch_' prefix followed by 12-character secure token. Generated using generate_secure_token() from shared.utilities.helpers. Used to query batch status and track processing across tiers. Immutable after creation.

Examples:

"btch_abc123xyz789"

"btch_video_batch_01"

status

enum<string>

OPTIONAL (defaults to DRAFT). Current processing status of the batch. Lifecycle: DRAFT → PENDING → IN_PROGRESS → COMPLETED/FAILED. DRAFT: Batch created but not yet submitted. PENDING: Batch submitted and queued for processing. IN_PROGRESS: Batch currently processing (one or more tiers active). COMPLETED: All tiers successfully completed. FAILED: One or more tiers failed. Aggregated from tier_tasks statuses during multi-tier processing.

Available options:

PENDING,

IN_PROGRESS,

PROCESSING,

COMPLETED,

COMPLETED_WITH_ERRORS,

FAILED,

CANCELED,

UNKNOWN,

SKIPPED,

DRAFT,

ACTIVE,

ARCHIVED,

SUSPENDED

object_ids

string[]

REQUIRED for processing (must have at least 1). List of object IDs to include in this batch. All objects must exist in the specified bucket_id. These objects are the source data for tier 0 processing. Minimum 1 object, no maximum limit. Objects are processed in parallel within each tier.

Minimum length: 1

Examples:

["obj_video_001", "obj_video_002"]

["obj_doc_123"]

collection_ids

string[] | null

OPTIONAL. List of all collection IDs involved in this batch's processing. Automatically populated during DAG resolution from dag_tiers. Includes collections from all tiers (flattened view of dag_tiers). Used for quick lookups without traversing tier structure. Format: List of collection IDs across all tiers.

Examples:

["col_chunks"]

["col_chunks", "col_frames", "col_scenes"]

error

string | null

OPTIONAL. Error message if the batch or any tier failed. None if batch succeeded or is still processing. Contains human-readable error description. For multi-tier batches, typically contains the error from the first failed tier. Check tier_tasks array for tier-specific error details.

Examples:

"Failed to process batch: Object not found"

"Tier 1 failed: Out of memory during frame extraction"

"Collection 'col_frames' not found"

type

enum<string>

OPTIONAL (defaults to BUCKET). Type of batch. BUCKET: Standard batch processing bucket objects through collections. COLLECTION: Reserved for future collection-only batch processing. Currently only BUCKET type is supported.

Available options:

BUCKET,

COLLECTION

manifest_key

string | null

OPTIONAL. S3 key where the batch manifest is stored. Contains metadata and row data (Parquet) for Engine processing. For tier 0, points to bucket object manifest. For tier N+, points to collection document manifest. Format: S3 path (e.g., 'namespace_id/internal_id/manifests/tier_0.parquet'). Generated during batch submission.

Examples:

"ns_abc/org_123/manifests/tier_0.parquet"

"ns_xyz/org_456/manifests/tier_1.parquet"

task_id

string | null

OPTIONAL. Primary task ID for the batch (typically tier 0 task). Used for backward compatibility with single-tier batch tracking. For multi-tier batches, prefer querying tier_tasks array for granular tracking. Format: Task ID as generated for tier 0.

Examples:

"task_tier0_abc123"

"task_batch_001"

loaded_object_ids

string[] | null

OPTIONAL. List of object IDs that were successfully validated and loaded into the batch. Subset of object_ids that passed validation. Used to track which objects are ready for processing. None if batch hasn't been validated yet.

Examples:

["obj_video_001", "obj_video_002"]

internal_metadata

object | null

OPTIONAL. Internal engine/job metadata for system use. May contain: job_id (provider-specific), engine_version, processing hints, last_health_check. last_health_check: Most recent health check results with health_status, enriched_documents, vector_populated_count, stall_duration_seconds, recommendations, missing_features. Populated asynchronously via Celery task (non-blocking, best-effort). Used for troubleshooting batch processing issues via API. Format: Free-form dictionary for internal tracking.

Examples:

{
  "include_processing_history": true,
  "last_health_check": {
    "enriched_documents": 98,
    "health_status": "HEALTHY",
    "missing_features": ["text_embedding"],
    "processed_documents": 100,
    "recommendations": [],
    "stall_duration_seconds": 0,
    "timestamp": "2025-11-06T10:05:00Z",
    "total_documents": 100,
    "vector_populated_count": 98
  }
}

metadata

object

OPTIONAL. User-defined metadata for the batch. Arbitrary key-value pairs for user tracking and organization. Persisted with the batch and returned in API responses. Not used by the system for processing logic. Examples: campaign_id, user_email, processing_notes.

Examples:

{
  "campaign_id": "Q4_2025",
  "priority": "high"
}

{
  "project": "video_analysis",
  "user_email": "[email protected]"
}

tier_tasks

TierTaskInfo · object[]

OPTIONAL. List of tier task tracking information for multi-tier processing. Each element represents one tier in the processing pipeline. Empty array for simple single-tier batches. Populated during batch submission with tier 0 info, then appended as tiers progress. Each TierTaskInfo contains: tier_num, task_id, status, collection_ids, timestamps. Used for granular monitoring: 'Show me status of tier 2' or 'Retry tier 1'. Array index typically matches tier_num (tier_tasks[0] = tier 0, tier_tasks[1] = tier 1, etc.).

Show child attributes

Examples:

[]

[
  {
    "collection_ids": ["col_chunks"],
    "status": "COMPLETED",
    "task_id": "task_tier0_abc",
    "tier_num": 0
  }
]

current_tier

integer | null

OPTIONAL. Zero-based index of the currently processing tier. None if batch hasn't started processing (status=DRAFT or PENDING). Updated as batch progresses through tiers. Used to show processing progress: 'Processing tier 2 of 5'. Set to last tier number when batch completes. Example: If processing tier 1 (frames), current_tier=1.

Required range: x >= 0

Examples:

0

1

2

total_tiers

integer

default:1

OPTIONAL (defaults to 1). Total number of tiers in the collection DAG. Minimum 1 (tier 0 only = bucket → collection). Set during DAG resolution when batch is submitted. Equals len(dag_tiers) if dag_tiers is populated. Used to calculate progress: current_tier / total_tiers. Example: 5-tier pipeline (bucket → chunks → frames → scenes → summaries) has total_tiers=5.

Required range: x >= 1

Examples:

1

3

5

dag_tiers

string[][] | null

OPTIONAL. Complete DAG tier structure for this batch. List of tiers, where each tier is a list of collection IDs to process at that stage. Tier 0 = bucket-sourced collections. Tier N (N > 0) = collection-sourced collections. Collections within same tier have no dependencies (can run in parallel). Collections in tier N+1 depend on collections in tier N. Populated during DAG resolution at batch submission. Used for tier-by-tier processing orchestration. Example: [['col_chunks'], ['col_frames', 'col_objects'], ['col_scenes']] = 3 tiers where frames and objects run in parallel at tier 1.

Examples:

[["col_chunks"]]

[["col_chunks"], ["col_frames"]]

[
  ["col_chunks"],
  ["col_frames", "col_objects"],
  ["col_scenes"]
]

created_at

string<date-time>

OPTIONAL (auto-set on creation). ISO 8601 timestamp when batch was created. Set using current_time() from shared.utilities.helpers. Immutable after creation. Used for batch age tracking and cleanup of old batches.

Examples:

"2025-11-03T10:00:00Z"

updated_at

string<date-time>

OPTIONAL (auto-updated). ISO 8601 timestamp when batch was last modified. Updated using current_time() whenever batch status or tier_tasks change. Used to track batch activity and identify stale batches.

Examples:

"2025-11-03T10:30:00Z"

Health

Organizations

Namespaces

Buckets

Feature Extractors

Collections

Retrievers

Taxonomies

Clusters

Analytics

Tasks

Webhooks

Add Objects to Batch

Authorizations

Headers

Path Parameters

Body

Response