POST
/
v1
/
clusters
/
jobs
/
submit
Submit Clustering Job
curl --request POST \
  --url https://api.mixpeek.com/v1/clusters/jobs/submit \
  --header 'Content-Type: application/json' \
  --data '{
  "collection_ids": [
    "col_products_v1",
    "col_products_v2"
  ],
  "compute_metrics": true,
  "config": {
    "algorithm": "kmeans",
    "algorithm_params": {
      "max_iter": 300,
      "n_clusters": 5
    },
    "feature_vector": {
      "vector_name": "text_extractor_v1_embedding"
    },
    "normalize_features": true
  },
  "include_members": false,
  "sample_size": 10000,
  "store_results": true
}'
{
  "task_id": "task_123",
  "task_type": "api_namespaces_create",
  "status": "IN_PROGRESS",
  "inputs": [
    "file1.pdf",
    {
      "config": {
        "key": "value"
      }
    }
  ],
  "outputs": [
    "processed_file1.pdf",
    {
      "result": "success"
    }
  ],
  "additional_data": {
    "priority": "high",
    "user_id": "user_456"
  },
  "error_message": "<string>"
}

Headers

Authorization
string | null

Bearer token authentication using your API key. Format: 'Bearer your_api_key'. To get an API key, create an account at mixpeek.com/start and generate a key in your account settings. Example: 'Bearer sk_1234567890abcdef'

X-Namespace
string | null

Optional namespace for data isolation. This can be a namespace name or namespace ID. Example: 'netflix_prod' or 'ns_1234567890'. To create a namespace, use the /namespaces endpoint.

Query Parameters

cluster_id
string | null

Optional cluster_id to link job to cluster doc

Body

application/json

Request to execute clustering on one or more collections.

collection_ids
string[]
required

IDs of the collections to cluster together

Minimum length: 1
config
object
required

Clustering configuration including algorithm and parameters

sample_size
integer | null

Number of documents to sample for clustering

store_results
boolean
default:true

Whether to store clustering results

include_members
boolean
default:false

Whether to include cluster membership in results

compute_metrics
boolean
default:true

Whether to compute clustering quality metrics

save_artifacts
boolean
default:false

Whether to save clustering artifacts (e.g., parquet) to S3

Response

Successful Response

Task response.

task_id
string
required

Unique identifier for the task

Example:

"task_123"

task_type
enum<string>
required

Type of the task

Available options:
api_namespaces_create,
api_buckets_objects_create,
api_buckets_delete,
api_buckets_batches_process,
api_buckets_batches_submit,
api_taxonomies_create,
api_taxonomies_execute,
api_taxonomies_materialize,
engine_feature_extractor_run,
engine_inference_run,
engine_object_processing,
engine_cluster_build,
thumbnail,
materialize
status
enum<string>
required

Current status of the task

Available options:
PENDING,
IN_PROGRESS,
PROCESSING,
COMPLETED,
FAILED,
CANCELED,
UNKNOWN,
SKIPPED,
DRAFT
inputs
Inputs · array

List of input parameters or data for the task

Example:
[
"file1.pdf",
{ "config": { "key": "value" } }
]
outputs
Outputs · array

List of output results from the task

Example:
[
"processed_file1.pdf",
{ "result": "success" }
]
additional_data
object | null

Additional metadata or context for the task

Example:
{ "priority": "high", "user_id": "user_456" }
error_message
string | null

Flattened error message derived from additional_data['error'] if present.