Skip to main content
POST
/
v1
/
clusters
/
execute
Execute Clustering
curl --request POST \
  --url https://api.mixpeek.com/v1/clusters/execute \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --header 'X-Namespace: <x-namespace>' \
  --data '{
  "collection_ids": [
    "col_products_v1",
    "col_products_v2"
  ],
  "compute_metrics": true,
  "config": {
    "algorithm": "kmeans",
    "algorithm_params": {
      "max_iter": 300,
      "n_clusters": 5
    },
    "feature_vector": {
      "vector_name": "text_extractor_v1_embedding"
    },
    "normalize_features": true
  },
  "include_members": false,
  "sample_size": 10000,
  "store_results": true
}'
{
  "algorithm": "kmeans",
  "centroids": [
    {
      "centroid_vector": [
        0.1,
        0.2,
        0.3
      ],
      "cluster_id": "cluster_001",
      "feature_dimensions": 512,
      "num_members": 2000,
      "vector_name": "product_embedding"
    }
  ],
  "created_at": "2024-01-15T10:30:00Z",
  "execution_time_ms": 5432,
  "members_key": "int_abc123/ns_xyz789/engine_cluster_build/run_xyz789/members.parquet",
  "metrics": {
    "davies_bouldin_score": 0.8,
    "silhouette_score": 0.65
  },
  "num_clusters": 5,
  "num_documents": 10000,
  "parquet_path": "int_abc123/ns_xyz789/engine_cluster_build/run_xyz789/clusters.parquet",
  "run_id": "run_xyz789",
  "success": true
}

Authorizations

Authorization
string
header
required

Bearer token authentication using your API key. Format: 'Bearer your_api_key'. To get an API key, create an account at mixpeek.com/start and generate a key in your account settings.

Headers

Authorization
string
required

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer sk_live_abc123def456"

"Bearer sk_test_xyz789"

X-Namespace
string
required

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Body

application/json

Request to execute clustering on one or more collections.

collection_ids
string[]
required

IDs of the collections to cluster together

Minimum length: 1
config
object
required

Clustering configuration including algorithm and parameters

Examples:
{
"algorithm": "kmeans",
"algorithm_params": {
"max_iter": 300,
"n_clusters": 5,
"random_state": 42
},
"description": "Vector-based clustering with K-means",
"feature_vector": {
"feature_address": "mixpeek://text_extractor@v1/text_extractor_v1_embedding"
},
"llm_labeling": {
"enabled": true,
"model_name": "gpt-4o-mini",
"provider": "openai"
},
"normalize_features": true
}
{
"algorithm": "hdbscan",
"algorithm_params": { "min_cluster_size": 10, "min_samples": 5 },
"description": "Vector-based clustering with HDBSCAN",
"feature_vector": {
"feature_address": "mixpeek://image_extractor@v1/image_extractor_v1_embedding"
},
"normalize_features": false
}
{
"algorithm": "attribute_based",
"attribute_config": {
"attributes": ["category"],
"hierarchical_grouping": false
},
"description": "Attribute-based clustering (simple category)",
"llm_labeling": {
"enabled": true,
"include_keywords": true,
"include_summary": true,
"provider": "openai"
}
}
{
"algorithm": "attribute_based",
"attribute_config": {
"aggregation_method": "most_frequent",
"attributes": ["category", "brand"],
"hierarchical_grouping": true
},
"description": "Attribute-based clustering (hierarchical category → brand)"
}
{
"algorithm": "attribute_based",
"attribute_config": {
"attributes": ["metadata.status", "metadata.priority"],
"hierarchical_grouping": false
},
"description": "Attribute-based clustering (nested attributes)"
}
namespace_id
string | null

Namespace ID for the request

internal_id
string | null

Internal ID for the request

sample_size
integer | null

Number of documents to sample for clustering

store_results
boolean
default:true

Whether to store clustering results

include_members
boolean
default:false

Whether to include cluster membership in results

compute_metrics
boolean
default:true

Whether to compute clustering quality metrics

save_artifacts
boolean
default:false

Whether to save clustering artifacts (e.g., parquet) to S3

Response

Successful Response

Response from cluster execution.

success
boolean
required

Whether clustering was successful

algorithm
enum<string>
required

Algorithm used for clustering

Available options:
kmeans,
dbscan,
hdbscan,
agglomerative,
spectral,
gaussian_mixture,
mean_shift,
optics,
attribute_based
num_clusters
integer
required

Number of clusters found

num_documents
integer
required

Number of documents clustered

centroids
ClusterCentroid · object[]
required

Cluster centroids with features

execution_time_ms
integer
required

Total execution time in milliseconds

run_id
string

Unique identifier for this clustering run

metrics
object

Clustering quality metrics

parquet_path
string | null

S3 key path to parquet file with full results

members_key
string | null

S3 key to members.parquet (if saved)

created_at
string<date-time>

Timestamp of clustering