Skip to main content
Features are the model outputs produced when collections run their configured extractors. They live inside documents, power retrieval and enrichment, and are always referenced with a stable URI.

Feature URIs

mixpeek://{extractor_name}@{version}/{output_name}
Examples:
  • mixpeek://text_extractor@v1/text_embedding
  • mixpeek://clip_vit_l_14@v1/image_embedding
  • mixpeek://splade_extractor@v1/splade_vector
  • mixpeek://colbert_extractor@v1/colbert_embeddings
Use feature URIs when:
  • Defining retriever stages (feature_address)
  • Configuring taxonomy input mappings
  • Building clustering jobs
  • Inspecting collection output schemas

Feature Anatomy

Documents store features as regular fields plus vector payloads:
{
  "document_id": "doc_123",
  "collection_id": "col_products",
  "metadata": {
    "category": "audio",
    "brand": "Acme"
  },
  "text_extractor_v1_embedding": [0.12, 0.03, ...],
  "splade_extractor_v1_vector": {
    "indices": [345, 912],
    "values": [0.82, 0.54]
  },
  "clip_vit_l_14_v1_embedding": [0.09, 0.41, ...],
  "source_blobs": [...],
  "internal_metadata": {
    "processing_history": [...]
  }
}
Vector fields are automatically written to Qdrant as named vectors; payload fields stay in MongoDB and Qdrant payloads for filtering and enrichment.

Inspect Available Features

  • GET /v1/collections/{collection_id} – returns the deterministic output_schema.
  • GET /v1/collections/{collection_id}/features – enumerates feature URIs, dimensions, and metadata.
  • GET /v1/feature-extractors – discover available extractors, versions, and output fields.

Feature Types

ExtractorFeature TypeTypical Use
text_extractorDense embedding (1024–1536 dims)Semantic text search
splade_extractorSparse vector (indices + weights)Lexical / hybrid search
colbert_extractorMulti-vector (per-token)Late interaction search
clip_vit_l_14Dense multimodal embeddingImage & text similarity
video_extractorScene embeddings + metadataVideo retrieval & analytics
whisper_large_v3Transcription + timestampsAudio search & diarization
Refer to Feature Extractors for full schemas and parameters.

Working with Feature URIs

{
  "stage_name": "knn_search",
  "version": "v1",
  "parameters": {
    "feature_address": "mixpeek://text_extractor@v1/text_embedding",
    "input_mapping": { "text": "query_text" },
    "limit": 20
  }
}
Taxonomy example:
{
  "taxonomy_type": "flat",
  "retriever_id": "ret_face_matcher",
  "input_mappings": {
    "query_embedding": "mixpeek://face_detector@v2/face_embedding"
  }
}

Best Practices

  1. Version carefully – upgrading an extractor version creates new feature URIs. Re-index collections or create new ones for breaking changes.
  2. Name consistently – stick to canonical URIs in retrievers and enrichment jobs to avoid mismatches.
  3. Store passthrough metadata – combine features with metadata fields (category, locale) for precise filters and joins.
  4. Monitor extractor performance – Analytics endpoints (/v1/analytics/extractors/performance) help validate throughput and latency.
  5. Leverage inference caching – repeated calls to the same feature URI benefit from the Engine’s inference cache.
With feature URIs, you always know which model (and version) populated a document field, ensuring queries remain compatible with ingestion pipelines.