Collections are where processed documents live. They define how outputs are stored after feature extraction and serve as the main retrieval surface.
Overview
- What they store: Structured documents produced by feature extractors.
- Inputs: Typically ingest from a bucket (raw objects) or another collection.
- Consistency: All documents in a collection share a schema determined by configured feature extractors.
- Retrieval: Retrievers query one or more collections to return ranked results.
Collection model
Minimum fields when creating a collection. See the endpoint at Create Collection.source.type
:BUCKET
orCOLLECTION
.feature_extractors
: Which models generate vectors/fields for documents.metadata
: User-defined metadata attached to the collection record.
Create a collection
- API: Create Collection
- Method: POST
- Path:
/v1/collections
- Reference: API Reference
Describe available features
List configured feature addresses and metadata for a collection.- API: Describe Collection Features
- Method: GET
- Path:
/v1/collections/{collection_identifier}/features
- Reference: API Reference
Manage collections
Documents in collections
Documents are created by pipelines and stored in the target collection. Use Collection Documents APIs to read/update.Used by
- Retrievers: Query collections for search results.
- Taxonomies: Enrich documents using similarity joins; configured against collection IDs.
- Clusters: Build and analyze clusters over vectors from one or more collections.
Behavior & validation
- Schema: Determined by feature extractor outputs; documents must conform.
- Lineage: Documents maintain references to the source object and processing metadata.
- Enablement: Collections can be enabled/disabled via the update endpoint.