Features & Feature URIs

Features are the model outputs produced when collections run their configured extractors. They live inside documents, power retrieval and enrichment, and are always referenced with a stable URI.

Feature URIs

mixpeek://{extractor_name}@{version}/{output_name}

Examples:

mixpeek://text_extractor@v1/text_embedding
mixpeek://clip_vit_l_14@v1/image_embedding
mixpeek://splade_extractor@v1/splade_vector
mixpeek://colbert_extractor@v1/colbert_embeddings

Use feature URIs when:

Defining retriever stages (feature_address)
Configuring taxonomy input mappings
Building clustering jobs
Inspecting collection output schemas

Feature Anatomy

Documents store features as regular fields plus vector payloads:

{
  "document_id": "doc_123",
  "collection_id": "col_products",
  "metadata": {
    "category": "audio",
    "brand": "Acme"
  },
  "text_extractor_v1_embedding": [0.12, 0.03, ...],
  "splade_extractor_v1_vector": {
    "indices": [345, 912],
    "values": [0.82, 0.54]
  },
  "clip_vit_l_14_v1_embedding": [0.09, 0.41, ...],
  "source_blobs": [...],
  "internal_metadata": {
    "processing_history": [...]
  }
}

Vector fields are automatically written to Qdrant as named vectors; payload fields stay in MongoDB and Qdrant payloads for filtering and enrichment.

Inspect Available Features

GET /v1/collections/{collection_id} – returns the deterministic output_schema.
GET /v1/collections/{collection_id}/features – enumerates feature URIs, dimensions, and metadata.
GET /v1/feature-extractors – discover available extractors, versions, and output fields.

Feature Types

Extractor	Feature Type	Typical Use
`text_extractor`	Dense embedding (1024–1536 dims)	Semantic text search
`splade_extractor`	Sparse vector (indices + weights)	Lexical / hybrid search
`colbert_extractor`	Multi-vector (per-token)	Late interaction search
`clip_vit_l_14`	Dense multimodal embedding	Image & text similarity
`video_extractor`	Scene embeddings + metadata	Video retrieval & analytics
`whisper_large_v3`	Transcription + timestamps	Audio search & diarization

Refer to Feature Extractors for full schemas and parameters.

Working with Feature URIs

{
  "stage_name": "knn_search",
  "version": "v1",
  "parameters": {
    "feature_address": "mixpeek://text_extractor@v1/text_embedding",
    "input_mapping": { "text": "query_text" },
    "limit": 20
  }
}

Taxonomy example:

{
  "taxonomy_type": "flat",
  "retriever_id": "ret_face_matcher",
  "input_mappings": {
    "query_embedding": "mixpeek://face_detector@v2/face_embedding"
  }
}

Best Practices

Version carefully – upgrading an extractor version creates new feature URIs. Re-index collections or create new ones for breaking changes.
Name consistently – stick to canonical URIs in retrievers and enrichment jobs to avoid mismatches.
Store passthrough metadata – combine features with metadata fields (category, locale) for precise filters and joins.
Monitor extractor performance – Analytics endpoints (/v1/analytics/extractors/performance) help validate throughput and latency.
Leverage inference caching – repeated calls to the same feature URI benefit from the Engine’s inference cache.

With feature URIs, you always know which model (and version) populated a document field, ensuring queries remain compatible with ingestion pipelines.

Getting Started

Ingest Data

Process Data

Search & Retrieve

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

Features & Feature URIs

Feature URIs

Feature Anatomy

Inspect Available Features

Feature Types

Working with Feature URIs

Best Practices

Getting Started

Ingest Data

Process Data

Search & Retrieve

Enrich & Organize

Operate in Production

Best Practices

Troubleshoot

​Feature URIs

​Feature Anatomy

​Inspect Available Features

​Feature Types

​Working with Feature URIs

​Best Practices

Feature URIs

Feature Anatomy

Inspect Available Features

Feature Types

Working with Feature URIs

Best Practices