Get Specific Cluster Execution

Headers

Authorization

string

required

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

X-Namespace

string

required

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Path Parameters

cluster_id

string

required

Cluster ID

run_id

string

required

Run ID

Response

Successful Response

Complete results from a single clustering execution.

Represents the outcome of running a clustering algorithm on a collection's documents. Each execution creates a snapshot of clustering results at a point in time, including the clusters found, quality metrics, and semantic labels.

Use Cases: - Display clustering execution history in UI - Compare clustering quality across multiple runs - Track execution status for long-running jobs - Debug failed clustering attempts - View cluster summaries and labels for analysis

Workflow: 1. Create cluster configuration → POST /clusters 2. Execute clustering → POST /clusters/{id}/execute 3. Poll execution status → GET /clusters/{id}/executions 4. View execution history → POST /clusters/{id}/executions/list

Status Lifecycle: pending → processing → completed (or failed)

Note: Execution results are immutable once completed. Re-running clustering creates a new execution result with a new run_id.

run_id

string

required

REQUIRED. Unique identifier for this specific clustering execution. Format: 'run_' prefix followed by random alphanumeric string. Used to retrieve specific execution artifacts and results. Each re-execution of the same cluster creates a new run_id. References execution artifacts in S3 and MongoDB.

cluster_id

string

required

REQUIRED. Parent cluster configuration that was executed. Format: 'clust_' prefix followed by random alphanumeric string. Links this execution back to the cluster definition. Multiple executions can share the same cluster_id.

status

enum<string>

required

REQUIRED. Current status of the clustering execution. Values: 'pending' = Job queued, waiting to start. 'processing' = Clustering algorithm running (may take minutes for large datasets). 'completed' = Clustering finished successfully, results available. 'failed' = Clustering failed, check error_message for details. Status changes: pending → processing → (completed OR failed). Poll this field to track job progress.

Available options:

pending,

processing,

completed,

failed

num_clusters

integer

required

REQUIRED. Number of clusters found by the clustering algorithm. Range: 1 to num_points (though typically much lower). Interpretation: Too few clusters = overgeneralization, may need lower n_clusters param. Too many clusters = overfitting, may need higher n_clusters param. Optimal value depends on dataset and use case. Available immediately upon completion, even if metrics fail.

Required range: x >= 0

num_points

integer

required

REQUIRED. Total number of documents/points that were clustered. Equals the count of documents in the collection at execution time. Note: This may differ across executions if documents were added/removed. Used to calculate metrics and validate clustering quality. Minimum 2 points required for clustering (1 cluster per point otherwise).

Required range: x >= 0

created_at

string<date-time>

required

REQUIRED. Timestamp when the clustering execution started. ISO 8601 format with timezone (UTC). Used to: - Sort executions chronologically. - Calculate execution duration (completed_at - created_at). - Filter execution history by date range. Always present, even for failed executions.

metrics

ClusterExecutionMetrics · object

OPTIONAL. Quality metrics evaluating clustering performance. NOT REQUIRED - only present for successful executions. null if: - Execution is still pending/processing. - Execution failed. - Too few points to calculate metrics (need 2+ points). Contains silhouette_score, davies_bouldin_index, calinski_harabasz_score. Use to compare quality across multiple executions.

Show child attributes

metrics.silhouette_score

number | null

OPTIONAL. Silhouette score measuring cluster cohesion and separation. Range: -1 to +1. Interpretation: +1.0 = Perfect clustering (documents far from other clusters, close to own cluster). 0.0 = Overlapping clusters (documents on cluster boundaries). -1.0 = Poor clustering (documents assigned to wrong clusters). Practical thresholds: 0.7 to 1.0 = Excellent clustering. 0.5 to 0.7 = Good clustering. 0.25 to 0.5 = Weak clustering, consider different parameters. Below 0.25 = Poor clustering, reconfigure or more data needed. null = metric not calculated (too few points or clustering failed).

Required range: -1 <= x <= 1

Example:

0.85

metrics.davies_bouldin_index

number | null

OPTIONAL. Davies-Bouldin index measuring cluster separation. Range: 0 to +∞ (lower is better, no upper bound). Interpretation: 0.0 = Perfect separation (impossible in practice). 0.0 to 1.0 = Excellent separation. 1.0 to 2.0 = Good separation. Above 2.0 = Poor separation, clusters overlap. Formula: Average ratio of intra-cluster to inter-cluster distances. Use when: Validating that clusters are distinct and well-separated. null = metric not calculated (too few points or clustering failed).

Required range: x >= 0

Example:

0.45

metrics.calinski_harabasz_score

number | null

OPTIONAL. Calinski-Harabasz score (also called Variance Ratio Criterion). Range: 0 to +∞ (higher is better, no strict upper bound). Interpretation: Higher values indicate denser, more compact clusters. No universal threshold - compare relative values across runs. Typical good values: 100-1000+ (dataset dependent). Formula: Ratio of between-cluster to within-cluster dispersion. Use when: Comparing different numbers of clusters for the same dataset. Note: Biased toward algorithms that produce spherical, equally-sized clusters. null = metric not calculated (too few points or clustering failed).

Required range: x >= 0

Example:

456.78

Example:

{
  "calinski_harabasz_score": 1234.56,
  "davies_bouldin_index": 0.42,
  "description": "Excellent clustering quality",
  "silhouette_score": 0.85
}

centroids

ClusterExecutionCentroid · object[] | null

OPTIONAL. List of cluster centroids with semantic labels. NOT REQUIRED - only present for completed executions with LLM labeling enabled. Length: equals num_clusters. Each centroid contains: - cluster_id: Identifier for the cluster (e.g., 'cl_0'). - num_members: Count of documents in this cluster. - label: Human-readable cluster name (e.g., 'Product Reviews'). - summary: Brief description of cluster content. - keywords: Array of representative terms. null if: - Execution pending/processing/failed. - LLM labeling not configured. Use for: Displaying cluster summaries in UI, filtering by cluster.

Show child attributes

centroids.cluster_id

string

required

REQUIRED. Unique identifier for this cluster within the execution. Format: 'cl_' prefix followed by numeric index (e.g., 'cl_0', 'cl_1'). Used to reference this specific cluster in queries and enrichments. Consistent across executions if algorithm deterministic.

centroids.num_members

integer

required

REQUIRED. Number of documents/points assigned to this cluster. Indicates cluster size for sizing bubbles in visualizations. Minimum: 1 (K-Means forces assignment). Can be 0 for noise clusters in HDBSCAN (cluster_id = -1).

Required range: x >= 0

centroids.label

string | null

OPTIONAL. Human-readable label generated by LLM (e.g., GPT-4o-mini). Automatically generated when llm_labeling.enabled = true in cluster config. NOT REQUIRED when LLM labeling disabled. Describes the semantic meaning of documents in this cluster. Example: 'Product Reviews', 'Technical Documentation', 'Customer Support'.

Example:

"Product Reviews"

centroids.summary

string | null

OPTIONAL. Detailed description generated by LLM. Automatically generated when llm_labeling.include_summary = true. NOT REQUIRED when LLM labeling disabled or summary not requested. Provides context about what types of documents are in this cluster. Useful for tooltips, expanded views, or detailed explanations.

Example:

"This cluster contains documents related to product reviews and customer feedback."

centroids.keywords

string[] | null

OPTIONAL. List of semantic keywords generated by LLM. Automatically generated when llm_labeling.include_keywords = true. NOT REQUIRED when LLM labeling disabled or keywords not requested. Useful for search, filtering, and quick cluster understanding. Typically 3-5 keywords per cluster.

Example:

["reviews", "products", "feedback"]

completed_at

string<date-time> | null

OPTIONAL. Timestamp when the clustering execution finished. ISO 8601 format with timezone (UTC). NOT REQUIRED - only present for completed or failed executions. null if: status is 'pending' or 'processing'. Use to: - Calculate execution duration (completed_at - created_at). - Show when results became available. Present for both successful and failed executions.

Example:

"2025-11-13T13:25:40.122000Z"

error_message

string | null

OPTIONAL. Error message if the clustering execution failed. NOT REQUIRED - only present when status is 'failed'. null if: execution succeeded or is still in progress. Contains: - Human-readable error description. - Possible causes and suggested fixes. - Stack trace details (for debugging). Common errors: - 'Insufficient documents for clustering' (need 2+ docs). - 'Feature extractor not found' (invalid collection config). - 'Out of memory' (dataset too large for algorithm). Use for: Debugging failed executions and user error messages.

Example:

"Insufficient documents for clustering: need at least 2 documents"

llm_labeling_errors

string[] | null

OPTIONAL. List of errors encountered during LLM labeling. NOT REQUIRED - only present when LLM labeling was attempted and encountered errors. null if: - LLM labeling was not enabled. - LLM labeling succeeded for all clusters. - Execution is still in progress. Each error is a JSON string containing: - 'error': Human-readable error message. - 'clusters': List of cluster IDs affected by this error. Common errors: - 'LLM API timeout for 2 clusters' (network/API issues). - 'OpenAI rate limit exceeded' (quota exhausted). - 'Invalid model name: gpt-3.5' (config error). - 'No representative documents for cluster cl_3' (empty cluster). Use for: - Debugging why some clusters have fallback labels. - Identifying LLM API issues without failing entire clustering. - Warning users about partial labeling success.

Example:

[
  "{\"error\": \"LLM API timeout\", \"clusters\": [\"cl_3\", \"cl_5\"]}",
  "{\"error\": \"No representative documents\", \"clusters\": [\"cl_1\"]}"
]

Health

Namespaces

Buckets

Feature Extractors

Collections

Retrievers

Taxonomies

Clusters

Analytics

Tasks

Webhooks

Get Specific Cluster Execution

Headers

Path Parameters

Response