POST
/
v1
/
clusters
/
execute
Execute Clustering
curl --request POST \
  --url https://api.mixpeek.com/v1/clusters/execute \
  --header 'Content-Type: application/json' \
  --data '{
  "collection_ids": [
    "col_products_v1",
    "col_products_v2"
  ],
  "compute_metrics": true,
  "config": {
    "algorithm": "kmeans",
    "algorithm_params": {
      "max_iter": 300,
      "n_clusters": 5
    },
    "feature_vector": {
      "vector_name": "text_extractor_v1_embedding"
    },
    "normalize_features": true
  },
  "include_members": false,
  "sample_size": 10000,
  "store_results": true
}'
{
  "algorithm": "kmeans",
  "centroids": [
    {
      "centroid_vector": [
        0.1,
        0.2,
        0.3
      ],
      "cluster_id": "cluster_001",
      "feature_dimensions": 512,
      "num_members": 2000,
      "vector_name": "product_embedding"
    }
  ],
  "created_at": "2024-01-15T10:30:00Z",
  "execution_time_ms": 5432,
  "members_key": "int_abc123/ns_xyz789/engine_cluster_build/run_xyz789/members.parquet",
  "metrics": {
    "davies_bouldin_score": 0.8,
    "silhouette_score": 0.65
  },
  "num_clusters": 5,
  "num_documents": 10000,
  "parquet_path": "int_abc123/ns_xyz789/engine_cluster_build/run_xyz789/clusters.parquet",
  "run_id": "run_xyz789",
  "success": true
}

Headers

Authorization
string | null

Bearer token authentication using your API key. Format: 'Bearer your_api_key'. To get an API key, create an account at mixpeek.com/start and generate a key in your account settings. Example: 'Bearer sk_1234567890abcdef'

X-Namespace
string | null

Optional namespace for data isolation. This can be a namespace name or namespace ID. Example: 'netflix_prod' or 'ns_1234567890'. To create a namespace, use the /namespaces endpoint.

Body

application/json

Request to execute clustering on one or more collections.

collection_ids
string[]
required

IDs of the collections to cluster together

Minimum length: 1
config
object
required

Clustering configuration including algorithm and parameters

sample_size
integer | null

Number of documents to sample for clustering

store_results
boolean
default:true

Whether to store clustering results

include_members
boolean
default:false

Whether to include cluster membership in results

compute_metrics
boolean
default:true

Whether to compute clustering quality metrics

save_artifacts
boolean
default:false

Whether to save clustering artifacts (e.g., parquet) to S3

Response

Successful Response

Response from cluster execution.

success
boolean
required

Whether clustering was successful

algorithm
enum<string>
required

Algorithm used for clustering

Available options:
kmeans,
dbscan,
hdbscan,
agglomerative,
spectral,
gaussian_mixture,
mean_shift,
optics
num_clusters
integer
required

Number of clusters found

num_documents
integer
required

Number of documents clustered

centroids
ClusterCentroid · object[]
required

Cluster centroids with features

execution_time_ms
integer
required

Total execution time in milliseconds

run_id
string

Unique identifier for this clustering run

metrics
object

Clustering quality metrics

parquet_path
string | null

S3 key path to parquet file with full results

members_key
string | null

S3 key to members.parquet (if saved)

created_at
string<date-time>

Timestamp of clustering