Feature extractors transform raw content from buckets or collections into features stored inside documents. Availability varies by account; see the public catalog and contact support to enable additional extractors as needed.

Overview

  • What they do: Run models to generate vectors and structured fields inside documents.
  • Where they run: As part of ingestion pipelines for a target collection.
  • Outputs: Vectors (dense, sparse, multi) and payload fields, with index definitions applied by the collection.

Discover extractors

Configure in a collection

Attach extractors when creating a collection. Each entry declares the extractor name and version (plus optional parameters and mappings).
{
  "collection_name": "products_v1",
  "source": {"type": "BUCKET", "bucket_id": "bkt_123"},
  "feature_extractors": [
    {
      "feature_extractor_name": "gte_modernbert_base",
      "version": "1.0.0",
      "parameters": {},
      "input_mappings": {"text": "payload.title"}
    }
  ]
}
  • Extractor outputs determine feature field names and vector index requirements for the collection.
  • Use Describe Collection Features to see resolved addresses and metadata.

Behavior & availability

  • Account‑dependent: Certain extractors are not enabled by default; request access if needed.
  • Versioned: Changing model versions typically requires reprocessing to keep features consistent.
  • Indexes: Required vector/payload indexes are applied by the engine based on extractor outputs.

Used by

  • Collections: Store the produced features in documents (Collections).
  • Retrievers: Search across vectors and payloads (Retrievers).
  • Taxonomies: Join and enrich using feature fields (Taxonomies).
  • Clusters: Build similarity groups over feature vectors (Clusters).

Manage and inspect

See also