Execute Clustering

curl --request POST \
  --url https://api.mixpeek.com/v1/clusters/execute \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --header 'X-Namespace: <x-namespace>' \
  --data '{
  "collection_ids": [
    "col_products_v1",
    "col_products_v2"
  ],
  "compute_metrics": true,
  "config": {
    "algorithm": "kmeans",
    "algorithm_params": {
      "max_iter": 300,
      "n_clusters": 5
    },
    "feature_vector": {
      "vector_name": "text_extractor_v1_embedding"
    },
    "normalize_features": true
  },
  "include_members": false,
  "sample_size": 10000,
  "store_results": true
}'

{
  "algorithm": "kmeans",
  "centroids": [
    {
      "centroid_vector": [
        0.1,
        0.2,
        0.3
      ],
      "cluster_id": "cluster_001",
      "feature_dimensions": 512,
      "num_members": 2000,
      "vector_name": "product_embedding"
    }
  ],
  "created_at": "2024-01-15T10:30:00Z",
  "execution_time_ms": 5432,
  "members_key": "int_abc123/ns_xyz789/engine_cluster_build/run_xyz789/members.parquet",
  "metrics": {
    "davies_bouldin_score": 0.8,
    "silhouette_score": 0.65
  },
  "num_clusters": 5,
  "num_documents": 10000,
  "parquet_path": "int_abc123/ns_xyz789/engine_cluster_build/run_xyz789/clusters.parquet",
  "run_id": "run_xyz789",
  "success": true
}

POST

clusters

execute

Execute Clustering

curl --request POST \
  --url https://api.mixpeek.com/v1/clusters/execute \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --header 'X-Namespace: <x-namespace>' \
  --data '{
  "collection_ids": [
    "col_products_v1",
    "col_products_v2"
  ],
  "compute_metrics": true,
  "config": {
    "algorithm": "kmeans",
    "algorithm_params": {
      "max_iter": 300,
      "n_clusters": 5
    },
    "feature_vector": {
      "vector_name": "text_extractor_v1_embedding"
    },
    "normalize_features": true
  },
  "include_members": false,
  "sample_size": 10000,
  "store_results": true
}'

{
  "algorithm": "kmeans",
  "centroids": [
    {
      "centroid_vector": [
        0.1,
        0.2,
        0.3
      ],
      "cluster_id": "cluster_001",
      "feature_dimensions": 512,
      "num_members": 2000,
      "vector_name": "product_embedding"
    }
  ],
  "created_at": "2024-01-15T10:30:00Z",
  "execution_time_ms": 5432,
  "members_key": "int_abc123/ns_xyz789/engine_cluster_build/run_xyz789/members.parquet",
  "metrics": {
    "davies_bouldin_score": 0.8,
    "silhouette_score": 0.65
  },
  "num_clusters": 5,
  "num_documents": 10000,
  "parquet_path": "int_abc123/ns_xyz789/engine_cluster_build/run_xyz789/clusters.parquet",
  "run_id": "run_xyz789",
  "success": true
}

Authorizations

Authorization

string

header

required

Bearer token authentication using your API key. Format: 'Bearer your_api_key'. To get an API key, create an account at mixpeek.com/start and generate a key in your account settings.

Headers

Authorization

string

required

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer sk_live_abc123def456"

"Bearer sk_test_xyz789"

X-Namespace

string

required

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Body

application/json

Request to execute clustering on one or more collections.

collection_ids

string[]

required

IDs of the collections to cluster together

Minimum length: 1

config

object

required

Clustering configuration including algorithm and parameters

Show child attributes

Examples:

{
  "algorithm": "kmeans",
  "algorithm_params": {
    "max_iter": 300,
    "n_clusters": 5,
    "random_state": 42
  },
  "description": "Vector-based clustering with K-means",
  "feature_vector": {
    "feature_address": "mixpeek://text_extractor@v1/text_extractor_v1_embedding"
  },
  "llm_labeling": {
    "enabled": true,
    "model_name": "gpt-4o-mini",
    "provider": "openai"
  },
  "normalize_features": true
}

{
  "algorithm": "hdbscan",
  "algorithm_params": { "min_cluster_size": 10, "min_samples": 5 },
  "description": "Vector-based clustering with HDBSCAN",
  "feature_vector": {
    "feature_address": "mixpeek://image_extractor@v1/image_extractor_v1_embedding"
  },
  "normalize_features": false
}

{
  "algorithm": "attribute_based",
  "attribute_config": {
    "attributes": ["category"],
    "hierarchical_grouping": false
  },
  "description": "Attribute-based clustering (simple category)",
  "llm_labeling": {
    "enabled": true,
    "include_keywords": true,
    "include_summary": true,
    "provider": "openai"
  }
}

{
  "algorithm": "attribute_based",
  "attribute_config": {
    "aggregation_method": "most_frequent",
    "attributes": ["category", "brand"],
    "hierarchical_grouping": true
  },
  "description": "Attribute-based clustering (hierarchical category → brand)"
}

{
  "algorithm": "attribute_based",
  "attribute_config": {
    "attributes": ["metadata.status", "metadata.priority"],
    "hierarchical_grouping": false
  },
  "description": "Attribute-based clustering (nested attributes)"
}

namespace_id

string | null

Namespace ID for the request

internal_id

string | null

Internal ID for the request

sample_size

integer | null

Number of documents to sample for clustering

store_results

boolean

default:true

Whether to store clustering results

include_members

boolean

default:false

Whether to include cluster membership in results

compute_metrics

boolean

default:true

Whether to compute clustering quality metrics

save_artifacts

boolean

default:false

Whether to save clustering artifacts (e.g., parquet) to S3

Response

Successful Response

Response from cluster execution.

success

boolean

required

Whether clustering was successful

algorithm

enum<string>

required

Algorithm used for clustering

Available options:

kmeans,

dbscan,

hdbscan,

agglomerative,

spectral,

gaussian_mixture,

mean_shift,

optics,

attribute_based

num_clusters

integer

required

Number of clusters found

num_documents

integer

required

Number of documents clustered

centroids

ClusterCentroid · object[]

required

Cluster centroids with features

Show child attributes

execution_time_ms

integer

required

Total execution time in milliseconds

run_id

string

Unique identifier for this clustering run

metrics

object

Clustering quality metrics

Show child attributes

parquet_path

string | null

S3 key path to parquet file with full results

members_key

string | null

S3 key to members.parquet (if saved)

created_at

string<date-time>

Timestamp of clustering

List Clusters Submit Clustering Job

⌘I

Health

Organizations

Namespaces

Buckets

Feature Extractors

Collections

Retrievers

Taxonomies

Clusters

Analytics

Tasks

Webhooks

Execute Clustering

Authorizations

Headers

Body

Response