Create and run clustering jobs
- Create: Click New Cluster, select collections, pick vector or attribute clustering, and configure algorithm params. API: Create Cluster.
- Execute: Run real-time clustering on the Engine or submit as a job for async processing. API: Execute Clustering and Submit Job.
- Inspect: Review centroids, metrics, and members if saved. Download artifacts like parquet paths under Artifacts. API: Get Artifacts.
- List/Get/Delete: Manage clustering configurations and results. API: List, Get, Delete.
- Stream data: Browse cluster centroids and members directly. API: Stream Data.
- Apply enrichment: Attach cluster labels back to a source or target collection at scale. API: Apply Enrichment.
Tips
- Start with a sample size to validate parameters before full runs.
- Use LLM labeling for human-friendly labels when vectors are dense and unlabeled.
1
Create a cluster job
Choose collections and configure algorithm parameters; optionally set dimensionality reduction.
2
Execute or submit
Run in real-time or submit as an asynchronous job and track via Tasks.
3
Inspect and enrich
Review centroids and metrics, then apply enrichment back to collections if desired.
Artifacts such as parquet paths allow downstream analytics and reproducible exploration.