Skip to main content
GET
/
v1
/
clusters
/
{cluster_id}
/
executions
Get Latest Cluster Execution
curl --request GET \
  --url https://api.mixpeek.com/v1/clusters/{cluster_id}/executions \
  --header 'Authorization: <authorization>' \
  --header 'X-Namespace: <x-namespace>'
{
  "run_id": "<string>",
  "cluster_id": "<string>",
  "status": "pending",
  "num_clusters": 1,
  "num_points": 1,
  "created_at": "2023-11-07T05:31:56Z",
  "metrics": {
    "calinski_harabasz_score": 1234.56,
    "davies_bouldin_index": 0.42,
    "description": "Excellent clustering quality",
    "silhouette_score": 0.85
  },
  "centroids": [
    {
      "cluster_id": "<string>",
      "num_members": 1,
      "label": "Product Reviews",
      "summary": "This cluster contains documents related to product reviews and customer feedback.",
      "keywords": [
        "reviews",
        "products",
        "feedback"
      ]
    }
  ],
  "completed_at": "2025-11-13T13:25:40.122000Z",
  "error_message": "Insufficient documents for clustering: need at least 2 documents",
  "llm_labeling_errors": [
    "{\"error\": \"LLM API timeout\", \"clusters\": [\"cl_3\", \"cl_5\"]}",
    "{\"error\": \"No representative documents\", \"clusters\": [\"cl_1\"]}"
  ]
}

Headers

Authorization
string
required

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

X-Namespace
string
required

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Path Parameters

cluster_id
string
required

Cluster ID

Response

Successful Response

Complete results from a single clustering execution.

Represents the outcome of running a clustering algorithm on a collection's documents. Each execution creates a snapshot of clustering results at a point in time, including the clusters found, quality metrics, and semantic labels.

Use Cases: - Display clustering execution history in UI - Compare clustering quality across multiple runs - Track execution status for long-running jobs - Debug failed clustering attempts - View cluster summaries and labels for analysis

Workflow: 1. Create cluster configuration → POST /clusters 2. Execute clustering → POST /clusters/{id}/execute 3. Poll execution status → GET /clusters/{id}/executions 4. View execution history → POST /clusters/{id}/executions/list

Status Lifecycle: pending → processing → completed (or failed)

Note: Execution results are immutable once completed. Re-running clustering creates a new execution result with a new run_id.

run_id
string
required

REQUIRED. Unique identifier for this specific clustering execution. Format: 'run_' prefix followed by random alphanumeric string. Used to retrieve specific execution artifacts and results. Each re-execution of the same cluster creates a new run_id. References execution artifacts in S3 and MongoDB.

cluster_id
string
required

REQUIRED. Parent cluster configuration that was executed. Format: 'clust_' prefix followed by random alphanumeric string. Links this execution back to the cluster definition. Multiple executions can share the same cluster_id.

status
enum<string>
required

REQUIRED. Current status of the clustering execution. Values: 'pending' = Job queued, waiting to start. 'processing' = Clustering algorithm running (may take minutes for large datasets). 'completed' = Clustering finished successfully, results available. 'failed' = Clustering failed, check error_message for details. Status changes: pending → processing → (completed OR failed). Poll this field to track job progress.

Available options:
pending,
processing,
completed,
failed
num_clusters
integer
required

REQUIRED. Number of clusters found by the clustering algorithm. Range: 1 to num_points (though typically much lower). Interpretation: Too few clusters = overgeneralization, may need lower n_clusters param. Too many clusters = overfitting, may need higher n_clusters param. Optimal value depends on dataset and use case. Available immediately upon completion, even if metrics fail.

Required range: x >= 0
num_points
integer
required

REQUIRED. Total number of documents/points that were clustered. Equals the count of documents in the collection at execution time. Note: This may differ across executions if documents were added/removed. Used to calculate metrics and validate clustering quality. Minimum 2 points required for clustering (1 cluster per point otherwise).

Required range: x >= 0
created_at
string<date-time>
required

REQUIRED. Timestamp when the clustering execution started. ISO 8601 format with timezone (UTC). Used to: - Sort executions chronologically. - Calculate execution duration (completed_at - created_at). - Filter execution history by date range. Always present, even for failed executions.

metrics
ClusterExecutionMetrics · object

OPTIONAL. Quality metrics evaluating clustering performance. NOT REQUIRED - only present for successful executions. null if: - Execution is still pending/processing. - Execution failed. - Too few points to calculate metrics (need 2+ points). Contains silhouette_score, davies_bouldin_index, calinski_harabasz_score. Use to compare quality across multiple executions.

Example:
{
"calinski_harabasz_score": 1234.56,
"davies_bouldin_index": 0.42,
"description": "Excellent clustering quality",
"silhouette_score": 0.85
}
centroids
ClusterExecutionCentroid · object[] | null

OPTIONAL. List of cluster centroids with semantic labels. NOT REQUIRED - only present for completed executions with LLM labeling enabled. Length: equals num_clusters. Each centroid contains: - cluster_id: Identifier for the cluster (e.g., 'cl_0'). - num_members: Count of documents in this cluster. - label: Human-readable cluster name (e.g., 'Product Reviews'). - summary: Brief description of cluster content. - keywords: Array of representative terms. null if: - Execution pending/processing/failed. - LLM labeling not configured. Use for: Displaying cluster summaries in UI, filtering by cluster.

completed_at
string<date-time> | null

OPTIONAL. Timestamp when the clustering execution finished. ISO 8601 format with timezone (UTC). NOT REQUIRED - only present for completed or failed executions. null if: status is 'pending' or 'processing'. Use to: - Calculate execution duration (completed_at - created_at). - Show when results became available. Present for both successful and failed executions.

Example:

"2025-11-13T13:25:40.122000Z"

error_message
string | null

OPTIONAL. Error message if the clustering execution failed. NOT REQUIRED - only present when status is 'failed'. null if: execution succeeded or is still in progress. Contains: - Human-readable error description. - Possible causes and suggested fixes. - Stack trace details (for debugging). Common errors: - 'Insufficient documents for clustering' (need 2+ docs). - 'Feature extractor not found' (invalid collection config). - 'Out of memory' (dataset too large for algorithm). Use for: Debugging failed executions and user error messages.

Example:

"Insufficient documents for clustering: need at least 2 documents"

llm_labeling_errors
string[] | null

OPTIONAL. List of errors encountered during LLM labeling. NOT REQUIRED - only present when LLM labeling was attempted and encountered errors. null if: - LLM labeling was not enabled. - LLM labeling succeeded for all clusters. - Execution is still in progress. Each error is a JSON string containing: - 'error': Human-readable error message. - 'clusters': List of cluster IDs affected by this error. Common errors: - 'LLM API timeout for 2 clusters' (network/API issues). - 'OpenAI rate limit exceeded' (quota exhausted). - 'Invalid model name: gpt-3.5' (config error). - 'No representative documents for cluster cl_3' (empty cluster). Use for: - Debugging why some clusters have fallback labels. - Identifying LLM API issues without failing entire clustering. - Warning users about partial labeling success.

Example:
[
"{\"error\": \"LLM API timeout\", \"clusters\": [\"cl_3\", \"cl_5\"]}",
"{\"error\": \"No representative documents\", \"clusters\": [\"cl_1\"]}"
]