Retrievers are schema‑validated, multi‑stage search pipelines. They combine vector search, metadata filters, grouping, sorting, and optional post‑processing to return ranked results from one or more collections.

Overview

Pipelines

Compose ordered stages (KNN, hybrid, filters, reranking, generation)

Schema

Define an input schema for query‑time inputs

Collections

Target one or more collections for search

Execution

Single execute endpoint with filters, sorts, grouping, selection, pagination

Capabilities

Vector & Hybrid

KNN over specific indexes; hybrid fusion across multiple vectors

Filters & Logic

AND/OR/NOT with typed operators and case sensitivity controls

Sorting & Paging

Sort by score or fields; stable limit/offset

Grouping & Selection

Group by a field; return only requested fields via select

Presigned URLs

Optional asset URLs with return_urls=true

Introspection

List stages and describe features to understand searchable fields

Create a retriever

  • API: Create Retriever
  • Method: POST
  • Path: /v1/retrievers
  • Reference: API Reference
curl -X POST https://api.mixpeek.com/v1/retrievers \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Namespace: ns_123" \
  -H "Content-Type: application/json" \
  -d '{
    "retriever_name": "products_semantic_v1",
    "description": "Semantic search over product catalog",
    "collection_ids": ["col_products_v1"],
    "input_schema": {"properties": {"query_text": {"type": "text"}}},
    "stages": [
      {"stage_name": "knn_search", "version": "1.0.0", "parameters": {"k": 50, "vector_field": "features.text_embedding"}}
    ]
  }'

Execute a retriever

  • API: Execute Retriever
  • Method: POST
  • Path: /v1/retrievers/{retriever_identifier}/execute
  • Reference: API Reference
curl -X POST https://api.mixpeek.com/v1/retrievers/RET_ID/execute \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Namespace: ns_123" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {"query_text": "red running shoes"},
    "filters": {"AND": [{"field": "metadata.category", "operator": "eq", "value": "footwear"}]},
    "sorts": [{"field": "score", "direction": "desc"}],
    "group_by": {"field": "metadata.brand", "max_features": 3},
    "select": ["document_id", "metadata.brand", "title"],
    "limit": 10,
    "offset": 0,
    "return_urls": true
  }'

How it works

1

Inputs

Validated against the retriever input_schema
2

Stage execution

Each stage refines candidates; hybrid and filters can combine modalities and metadata
3

Response shaping

Apply optional grouping, sorting, pagination, and select
Enable return_urls=true to include presigned URLs for assets in results.

Inputs and filters

{
  "AND": [
    {"field": "metadata.category", "operator": "eq", "value": "footwear"},
    {"OR": [
      {"field": "price", "operator": "lte", "value": 100},
      {"field": "on_sale", "operator": "eq", "value": true}
    ]}
  ],
  "case_sensitive": false
}

Manage retrievers

Best practices

1

Start simple

Begin with a single KNN stage; add hybrid, filters, and reranking as needed
2

Pre‑filter early

Narrow candidates with metadata filters before expensive vector ops
3

Select fields

Use select to minimize payload size and latency
4

Group wisely

Keep per‑group caps small to avoid skew
5

Version retrievers

Create new retrievers for breaking changes to inputs or stages

See also