Get Cluster

curl --request GET \
  --url https://api.mixpeek.com/v1/clusters/{cluster_identifier}

{
  "cluster_name": "<string>",
  "namespace_id": "<string>",
  "input_collections": [
    "<string>"
  ],
  "cluster_type": "vector",
  "cluster_id": "<string>",
  "source_bucket_ids": [
    "<string>"
  ],
  "filters": {},
  "feature_uris": [
    "<string>"
  ],
  "multi_feature_strategy": "<string>",
  "learned_weights": {},
  "learning_quality_score": 123,
  "effective_feature_method": "<string>",
  "clustered_attributes": [
    "<string>"
  ],
  "hierarchical_grouping": true,
  "aggregation_method": "<string>",
  "output_collection_ids": [
    "<string>"
  ],
  "output_collection_names": [
    "<string>"
  ],
  "algorithm": "<string>",
  "algorithm_params": {},
  "enrich_source": false,
  "source_enrichment_config": {
    "field_mappings": [
      {
        "source_field": "cluster_id",
        "target_field": "category_id"
      },
      {
        "source_field": "cluster_label",
        "target_field": "category_name"
      },
      {
        "source_field": "distance_to_centroid",
        "target_field": "category_confidence"
      }
    ]
  },
  "llm_labeling": {
    "description": "Text-only labeling with multiple fields",
    "enabled": true,
    "include_keywords": true,
    "include_summary": true,
    "labeling_inputs": {
      "input_mappings": [
        {
          "input_key": "title",
          "path": "title",
          "source_type": "payload"
        },
        {
          "input_key": "description",
          "path": "description",
          "source_type": "payload"
        },
        {
          "input_key": "text",
          "path": "text",
          "source_type": "payload"
        }
      ]
    },
    "model_name": "gpt-4o-mini-2024-07-18",
    "provider": "openai"
  },
  "num_clusters": 123,
  "num_documents_clustered": 123,
  "execution_time_seconds": 123,
  "hierarchy_detected": false,
  "parent_cluster_id": "<string>",
  "child_cluster_ids": [
    "<string>"
  ],
  "hierarchy_relationships": [
    {}
  ],
  "status": "PENDING",
  "last_execution_task_id": "<string>",
  "created_at": "2023-11-07T05:31:56Z",
  "updated_at": "2023-11-07T05:31:56Z",
  "last_executed_at": "2023-11-07T05:31:56Z",
  "completed_at": "2023-11-07T05:31:56Z",
  "llm_labeling_errors": [
    "<string>"
  ],
  "metadata": {}
}

GET

clusters

{cluster_identifier}

Get Cluster

curl --request GET \
  --url https://api.mixpeek.com/v1/clusters/{cluster_identifier}

{
  "cluster_name": "<string>",
  "namespace_id": "<string>",
  "input_collections": [
    "<string>"
  ],
  "cluster_type": "vector",
  "cluster_id": "<string>",
  "source_bucket_ids": [
    "<string>"
  ],
  "filters": {},
  "feature_uris": [
    "<string>"
  ],
  "multi_feature_strategy": "<string>",
  "learned_weights": {},
  "learning_quality_score": 123,
  "effective_feature_method": "<string>",
  "clustered_attributes": [
    "<string>"
  ],
  "hierarchical_grouping": true,
  "aggregation_method": "<string>",
  "output_collection_ids": [
    "<string>"
  ],
  "output_collection_names": [
    "<string>"
  ],
  "algorithm": "<string>",
  "algorithm_params": {},
  "enrich_source": false,
  "source_enrichment_config": {
    "field_mappings": [
      {
        "source_field": "cluster_id",
        "target_field": "category_id"
      },
      {
        "source_field": "cluster_label",
        "target_field": "category_name"
      },
      {
        "source_field": "distance_to_centroid",
        "target_field": "category_confidence"
      }
    ]
  },
  "llm_labeling": {
    "description": "Text-only labeling with multiple fields",
    "enabled": true,
    "include_keywords": true,
    "include_summary": true,
    "labeling_inputs": {
      "input_mappings": [
        {
          "input_key": "title",
          "path": "title",
          "source_type": "payload"
        },
        {
          "input_key": "description",
          "path": "description",
          "source_type": "payload"
        },
        {
          "input_key": "text",
          "path": "text",
          "source_type": "payload"
        }
      ]
    },
    "model_name": "gpt-4o-mini-2024-07-18",
    "provider": "openai"
  },
  "num_clusters": 123,
  "num_documents_clustered": 123,
  "execution_time_seconds": 123,
  "hierarchy_detected": false,
  "parent_cluster_id": "<string>",
  "child_cluster_ids": [
    "<string>"
  ],
  "hierarchy_relationships": [
    {}
  ],
  "status": "PENDING",
  "last_execution_task_id": "<string>",
  "created_at": "2023-11-07T05:31:56Z",
  "updated_at": "2023-11-07T05:31:56Z",
  "last_executed_at": "2023-11-07T05:31:56Z",
  "completed_at": "2023-11-07T05:31:56Z",
  "llm_labeling_errors": [
    "<string>"
  ],
  "metadata": {}
}

Headers

Authorization

string

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer YOUR_API_KEY"

"Bearer YOUR_STRIPE_API_KEY"

X-Namespace

string

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Path Parameters

cluster_identifier

string

required

Cluster ID or name

Response

Successful Response

Cluster job metadata stored in MongoDB clusters collection.

This is separate from cluster documents themselves. Tracks job-level configuration, status, and summary statistics.

Supports both vector and attribute clustering with appropriate metadata.

cluster_name

string

required

Human-readable cluster name

namespace_id

string

required

Namespace this cluster belongs to

input_collections

string[]

required

Source collection IDs that were clustered

cluster_type

enum<string>

required

Type of clustering: vector (embedding-based) or attribute (metadata-based)

Available options:

vector,

attribute

cluster_id

string

Unique cluster job identifier

source_bucket_ids

string[] | null

Source bucket IDs that the input collections originated from. Enables bucket lineage tracking.

filters

Filters · object

Optional filters that were applied to pre-filter documents before clustering

feature_uris

string[] | null

Feature URIs that were clustered (mixpeek://{extractor}@{version}/{output}). Only for vector clustering.

multi_feature_strategy

string | null

Strategy used if multiple features (concatenate/independent/weighted). Only for vector clustering.

learned_weights

Learned Weights · object

Automatically learned feature weights (when multi_feature_strategy='weighted'). Keys are feature URIs, values are learned weights. Only populated after clustering execution completes.

Show child attributes

learning_quality_score

number | null

Clustering quality score from weight learning (e.g., silhouette score). Only populated when multi_feature_strategy='weighted' and weights were learned.

effective_feature_method

string | null

Method for calculating cluster centroids (mean/median/medoid). Only for vector clustering.

clustered_attributes

string[] | null

Attribute field names that were clustered. Only for attribute clustering.

hierarchical_grouping

boolean | null

Whether hierarchical clustering was used. Only for attribute clustering.

aggregation_method

string | null

Method for aggregating attributes (most_frequent/first/last). Only for attribute clustering.

output_collection_ids

string[]

Collection IDs where cluster documents are stored. For single output: list with one collection ID. For per-feature output: list with one collection ID per feature.

output_collection_names

string[]

Names of output collections. Corresponds to output_collection_ids.

algorithm

string | null

Clustering algorithm used (hdbscan, kmeans, attribute_based, etc.)

algorithm_params

Algorithm Params · object

Algorithm-specific parameters (not used for attribute_based)

enrich_source

boolean

default:false

Whether source documents were enriched with cluster_id

source_enrichment_config

SourceEnrichmentConfig · object

Configuration for source enrichment (if enrich_source=True)

Show child attributes

Example:

{
  "field_mappings": [
    {
      "source_field": "cluster_id",
      "target_field": "category_id"
    },
    {
      "source_field": "cluster_label",
      "target_field": "category_name"
    },
    {
      "source_field": "distance_to_centroid",
      "target_field": "category_confidence"
    }
  ]
}

llm_labeling

LLMLabeling · object

Configuration for LLM-based cluster labeling (applies to all cluster types)

Show child attributes

Example:

{
  "description": "Text-only labeling with multiple fields",
  "enabled": true,
  "include_keywords": true,
  "include_summary": true,
  "labeling_inputs": {
    "input_mappings": [
      {
        "input_key": "title",
        "path": "title",
        "source_type": "payload"
      },
      {
        "input_key": "description",
        "path": "description",
        "source_type": "payload"
      },
      {
        "input_key": "text",
        "path": "text",
        "source_type": "payload"
      }
    ]
  },
  "model_name": "gpt-4o-mini-2024-07-18",
  "provider": "openai"
}

num_clusters

integer | null

Number of clusters found (excludes noise/outliers, populated after execution)

num_documents_clustered

integer | null

Total documents processed

execution_time_seconds

number | null

Time taken to complete clustering

hierarchy_detected

boolean

default:false

Whether implicit hierarchy was detected (multi-feature independent) or created (hierarchical attributes)

parent_cluster_id

string | null

For child clusters in hierarchy

child_cluster_ids

string[] | null

For parent clusters

hierarchy_relationships

Hierarchy Relationships · object[] | null

Parent-child relationships detected from cluster membership overlap

status

enum<string>

default:PENDING

Cluster job status (propagated from TaskService)

Available options:

PENDING,

IN_PROGRESS,

PROCESSING,

COMPLETED,

COMPLETED_WITH_ERRORS,

FAILED,

CANCELED,

UNKNOWN,

SKIPPED,

DRAFT,

ACTIVE,

ARCHIVED,

SUSPENDED

last_execution_task_id

string | null

Most recent task ID for this cluster

created_at

string<date-time>

When cluster was created

updated_at

string<date-time>

When cluster was last updated

last_executed_at

string<date-time> | null

Last execution timestamp

completed_at

string<date-time> | null

When clustering completed successfully

llm_labeling_errors

string[] | null

List of errors encountered during LLM labeling (if any). Stored in MongoDB cluster metadata only, NOT in Qdrant cluster documents. Used to track LLM failures while allowing fallback labels to work.

metadata

Metadata · object

Additional user-defined metadata

Create Cluster Partially Update Cluster

⌘I

Namespaces

Buckets

Feature Extractors

Collections

Retrievers

Taxonomies

Clusters

Templates

Manifest

Resource Search

Inference

Tasks

Webhooks

Get Cluster

Headers

Path Parameters

Response