Skip to main content
PATCH
/
v1
/
clusters
/
{cluster_identifier}
Partially Update Cluster
curl --request PATCH \
  --url https://api.mixpeek.com/v1/clusters/{cluster_identifier} \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --header 'X-Namespace: <x-namespace>' \
  --data '
{
  "cluster_name": "<string>",
  "description": "<string>",
  "metadata": {}
}
'
{
  "collection_ids": [
    "<string>"
  ],
  "cluster_name": "<string>",
  "cluster_type": "vector",
  "vector_config": {
    "algorithm_params": {
      "min_cluster_size": 10,
      "min_samples": 5
    },
    "clustering_method": "hdbscan",
    "description": "HDBSCAN clustering with multimodal embeddings",
    "feature_uri": "mixpeek://multimodal_extractor@v1/multimodal_embedding",
    "sample_size": 1000
  },
  "attribute_config": {
    "attributes": [
      "category"
    ],
    "description": "Simple category clustering",
    "hierarchical_grouping": false
  },
  "filters": {
    "AND": [
      {
        "field": "name",
        "operator": "eq",
        "value": "John"
      },
      {
        "field": "age",
        "operator": "gte",
        "value": 30
      }
    ],
    "OR": [
      {
        "field": "status",
        "operator": "eq",
        "value": "active"
      },
      {
        "field": "role",
        "operator": "eq",
        "value": "admin"
      }
    ],
    "NOT": [
      {
        "field": "department",
        "operator": "eq",
        "value": "HR"
      },
      {
        "field": "location",
        "operator": "eq",
        "value": "remote"
      }
    ],
    "case_sensitive": true
  },
  "llm_labeling": {
    "description": "Text-only labeling with multiple fields",
    "enabled": true,
    "include_keywords": true,
    "include_summary": true,
    "labeling_inputs": {
      "input_mappings": [
        {
          "input_key": "title",
          "path": "title",
          "source_type": "payload"
        },
        {
          "input_key": "description",
          "path": "description",
          "source_type": "payload"
        },
        {
          "input_key": "text",
          "path": "text",
          "source_type": "payload"
        }
      ]
    },
    "model_name": "gpt-4o-mini-2024-07-18",
    "provider": "openai"
  },
  "enrich_source_collection": false,
  "source_enrichment_config": {
    "field_mappings": [
      {
        "source_field": "cluster_id",
        "target_field": "category_id"
      },
      {
        "source_field": "cluster_label",
        "target_field": "category_name"
      },
      {
        "source_field": "distance_to_centroid",
        "target_field": "category_confidence"
      }
    ]
  },
  "cluster_id": "<string>",
  "parquet_path": "<string>",
  "members_key": "<string>",
  "num_clusters": 123,
  "cluster_stats": {
    "num_clusters": 123,
    "noise_points": 123,
    "silhouette_score": 123,
    "extra": {}
  },
  "status": "PENDING",
  "task_id": "<string>",
  "last_run_id": "<string>",
  "created_at": "2023-11-07T05:31:56Z",
  "updated_at": "2023-11-07T05:31:56Z",
  "metadata": {}
}

Headers

Authorization
string
required

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

X-Namespace
string
required

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Path Parameters

cluster_identifier
string
required

Cluster ID or name

Body

application/json

Request model for partially updating a cluster (PATCH operation).

cluster_name
string | null

Updated name for the cluster

description
string | null

Updated description for the cluster

metadata
Metadata · object

Updated metadata for the cluster

Response

Successful Response

Cluster metadata stored in MongoDB.

collection_ids
string[] | null

Collections to cluster together

Minimum array length: 1
cluster_name
string | null

Optional human-friendly name for the clustering job

cluster_type
enum<string>
default:vector

Vector or attribute clustering

Available options:
vector,
attribute
vector_config
VectorBasedConfig · object

Required when cluster_type is 'vector'

Example:
{
"algorithm_params": { "min_cluster_size": 10, "min_samples": 5 },
"clustering_method": "hdbscan",
"description": "HDBSCAN clustering with multimodal embeddings",
"feature_uri": "mixpeek://multimodal_extractor@v1/multimodal_embedding",
"sample_size": 1000
}
attribute_config
AttributeBasedConfig · object

Required when cluster_type is 'attribute'

Example:
{
"attributes": ["category"],
"description": "Simple category clustering",
"hierarchical_grouping": false
}
filters
LogicalOperator · object

Optional filters to pre-filter documents before clustering (same format as list documents). Applied during Qdrant scroll before parquet export. Useful for clustering subsets like: status='active', category='electronics', etc.

llm_labeling
LLMLabeling · object

Optional configuration for LLM-based cluster labeling. When provided with enabled=True, clusters will have semantic labels generated by LLM instead of generic labels like 'Cluster 0'. When not provided or enabled=False, uses fallback labels.

Example:
{
"description": "Text-only labeling with multiple fields",
"enabled": true,
"include_keywords": true,
"include_summary": true,
"labeling_inputs": {
"input_mappings": [
{
"input_key": "title",
"path": "title",
"source_type": "payload"
},
{
"input_key": "description",
"path": "description",
"source_type": "payload"
},
{
"input_key": "text",
"path": "text",
"source_type": "payload"
}
]
},
"model_name": "gpt-4o-mini-2024-07-18",
"provider": "openai"
}
enrich_source_collection
boolean
default:false

If True, cluster results are written back to source collection(s) in-place instead of creating new output collections. Documents will be enriched with cluster_id, cluster_label, distance_to_centroid, and optionally other metadata. Similar to taxonomy enrichment pattern.

source_enrichment_config
SourceEnrichmentConfig · object

Configuration for source collection enrichment (only used if enrich_source_collection=True). Controls which fields are added to source documents and field naming conventions.

Example:
{
"field_mappings": [
{
"source_field": "cluster_id",
"target_field": "category_id"
},
{
"source_field": "cluster_label",
"target_field": "category_name"
},
{
"source_field": "distance_to_centroid",
"target_field": "category_confidence"
}
]
}
cluster_id
string

Unique cluster identifier

parquet_path
string | null

S3 path to parquet files with cluster data

members_key
string | null

S3 key to members.parquet (if saved)

num_clusters
integer | null

Number of clusters found

cluster_stats
ClusterStats · object

Clustering quality metrics

status
enum<string>
default:PENDING

Clustering job status

Available options:
PENDING,
IN_PROGRESS,
PROCESSING,
COMPLETED,
COMPLETED_WITH_ERRORS,
FAILED,
CANCELED,
UNKNOWN,
SKIPPED,
DRAFT,
ACTIVE,
ARCHIVED,
SUSPENDED
task_id
string | null

Associated task ID for clustering job

last_run_id
string | null

Run ID of the most recent successful clustering execution. Used to retrieve execution results.

created_at
string<date-time>

When the cluster was created

updated_at
string<date-time>

When the cluster was last updated

metadata
Metadata · object

Additional user-defined metadata for the cluster