Skip to main content
Retriever enrichments let you attach arbitrary retriever pipelines to collections. When documents are ingested, the configured retrievers execute against each document and write selected result fields back to the document. This enables LLM classification, cross-collection joins, and multi-stage enrichment without building custom extractor plugins.

How It Works

  1. Attach a retriever enrichment to a collection with input mappings and write-back field configuration
  2. Ingest documents via batch processing as usual
  3. Post-processing executes the retriever for each document, mapping document fields to retriever inputs
  4. Write-back extracts specified fields from retriever results and writes them to the document
Retriever enrichments run in Phase 4 of post-processing by default (after taxonomies, clusters, and alerts), but you can configure them to run in any phase.

Configuration

Each retriever enrichment has three main sections:

Input Mappings

Map document fields or constant values to retriever input parameters:
{
  "input_mappings": [
    {
      "input_key": "query",
      "source": {
        "source_type": "document_field",
        "path": "title"
      }
    },
    {
      "input_key": "collection_id",
      "source": {
        "source_type": "constant",
        "value": "col_reference_data"
      }
    }
  ]
}

Write-Back Fields

Configure which retriever result fields to write back to documents:
{
  "write_back_fields": [
    {
      "source_field": "category",
      "target_field": "_enrichment_category",
      "mode": "first"
    },
    {
      "source_field": "related_items",
      "target_field": "_related_ids",
      "mode": "all_as_array"
    }
  ]
}
Write-back modes:
ModeBehavior
firstWrite value from the first result only (default)
all_as_arrayCollect values from all results into a list
concatConcatenate string values from all results with ”, ” separator

Execution Control

FieldDescriptionDefault
execution_phasePost-processing phase (1-4)4 (Enrichment)
priorityPriority within phase (higher = runs first)0
scroll_filtersFilter which documents to enrichNone (all documents)
enabledWhether enrichment is activetrue

Example: LLM Classification at Ingestion

Attach a retriever with an llm_enrich stage to classify documents as they’re ingested:
from mixpeek import Mixpeek

client = Mixpeek(api_key="your-api-key")

# Update collection to add retriever enrichment
client.collections.update(
    collection_id="col_articles",
    retriever_enrichments=[
        {
            "retriever_id": "ret_classifier",
            "input_mappings": [
                {
                    "input_key": "query",
                    "source": {
                        "source_type": "document_field",
                        "path": "content"
                    }
                }
            ],
            "write_back_fields": [
                {
                    "source_field": "category",
                    "target_field": "_llm_category",
                    "mode": "first"
                },
                {
                    "source_field": "sentiment",
                    "target_field": "_llm_sentiment",
                    "mode": "first"
                }
            ],
            "enabled": True
        }
    ]
)

Example: Cross-Collection Join

Use a retriever enrichment to join data from a reference collection:
client.collections.update(
    collection_id="col_products",
    retriever_enrichments=[
        {
            "retriever_id": "ret_brand_lookup",
            "input_mappings": [
                {
                    "input_key": "query",
                    "source": {
                        "source_type": "document_field",
                        "path": "brand_name"
                    }
                }
            ],
            "write_back_fields": [
                {
                    "source_field": "brand_logo_url",
                    "target_field": "_brand_logo",
                    "mode": "first"
                },
                {
                    "source_field": "brand_category",
                    "target_field": "_brand_category",
                    "mode": "first"
                }
            ],
            "enabled": True
        }
    ]
)
Retriever enrichments execute sequentially within each collection to avoid race conditions. For collections with many documents, enrichment time scales linearly with document count.

Comparison with Other Enrichment Types

FeatureTaxonomiesClustersAlertsRetriever Enrichments
PurposeVector-based classificationDocument groupingNotificationsArbitrary retriever pipelines
OutputLabel + score fieldsCluster assignmentsWebhook notificationsConfigurable field write-back
Phase1234 (default)
Use casesFace matching, entity linkingSegmentation, pattern discoveryContent monitoringLLM classification, cross-collection joins