Skip to main content
Tasks provide a uniform way to monitor long-running operations (batch processing, clustering, taxonomy materialization, namespace migrations, etc.). Every task exposes a status from the shared TaskStatusEnum.

TaskStatusEnum

PENDING → PROCESSING → COMPLETED

            FAILED
Additional values include IN_PROGRESS, CANCELED, SKIPPED, UNKNOWN, DRAFT, ACTIVE, ARCHIVED, and SUSPENDED. All async resources in Mixpeek adopt this enum, so your polling logic works everywhere.

Anatomy of a Task

{
  "task_id": "tsk_processing_123",
  "task_type": "api_buckets_batches_process",
  "status": "PROCESSING",
  "inputs": ["batch_xyz789"],
  "outputs": null,
  "additional_data": {
    "batch_id": "batch_xyz789",
    "bucket_id": "bkt_products",
    "job_id": "ray_job_123"
  },
  "error_message": null
}
  • Cached in Redis for ~24 hours (fast lookup).
  • Persisted in MongoDB for historical auditing.
  • additional_data stores resource-specific details (e.g., Ray job IDs).

Polling Strategy

1

Poll the task

Query /v1/tasks/{task_id} with exponential backoff (start at 1s, cap at 30s).
2

Handle 404 gracefully

After Redis TTL expires you may receive 404; fall back to the underlying resource (batch, cluster, etc.).
3

Switch to resource polling

Use /v1/buckets/{bucket_id}/batches/{batch_id}, /v1/clusters/{cluster_id}, etc., for long-running operations.
Example hybrid poller:
while True:
    try:
        task = get_task(task_id)
    except NotFound:
        task = get_batch(bucket_id, batch_id)

    if task.status == "COMPLETED":
        break
    if task.status == "FAILED":
        raise RuntimeError(task.error_message)
    time.sleep(delay)
    delay = min(delay * 1.5, 30)

Webhooks & Notifications

  • Engine emits webhook events (e.g., collection.documents.written) when tasks complete relevant work.
  • Celery Beat dispatches those events to invalidate caches, update schemas, and notify external systems.
  • Prefer webhooks for near-real-time updates instead of aggressive polling.

Managing Tasks

  • GET /v1/tasks/{task_id} – fetch the latest status.
  • POST /v1/tasks/list – filter by type, status, namespace, or creation time.
  • POST /v1/tasks/{task_id}/kill – request cancellation (supported for batches and clustering jobs using Celery’s AbortableAsyncResult).

Best Practices

  1. Store task IDs returned by submit endpoints.
  2. Use exponential backoff to avoid hammering the API.
  3. Respect terminal states (COMPLETED, FAILED, CANCELED) and surface errors to operators.
  4. Leverage webhooks for side-effects like cache invalidation or notifications.
  5. Instrument monitoring—task history in MongoDB plus webhook logs provide a full audit trail.
Tasks keep the asynchronous parts of Mixpeek manageable—treat them as durable receipts for every long-running job.