Pipelines let you chain multiple collections so the output documents of one stage become the input to the next. This enables modular, versioned, and reusable processing for complex multimodal workflows.
Overview
- What: Multi-stage processing where a collection can use another collection as its source
- How: Set collection
source.type
toCOLLECTION
and reference the upstreamcollection_id
- Why: Compose extraction, transformation, enrichment, and indexing in clear stages
How chaining works
Each collection declares a source and its own feature extractors. By pointing one collection at another, you build a directed graph of stages.- Upstream collection writes documents with initial features
- Downstream collection reads those documents as its input and adds more features
Examples
- Video pipeline:
video_scenes
(BUCKET → scene splitting) →scene_analytics
(COLLECTION → face object detection) →scene_enriched
(COLLECTION → taxonomy enrichment) - Document pipeline:
docs_raw
(BUCKET → OCR and PDF parse) →docs_nlp
(COLLECTION → embeddings and entities) →docs_topics
(COLLECTION → topic tagging) - Image pipeline:
images_raw
(BUCKET → EXIF and thumbnail) →images_semantic
(COLLECTION → CLIP embeddings) →images_moderation
(COLLECTION → safety labels)
What this unlocks
- Modularity: Swap or upgrade a stage without rebuilding the entire flow
- Versioning: Keep
*_v1
,*_v2
collections side by side for safe rollouts - Reuse: Share upstream collections across multiple downstream use cases
- Parallelism: Run different enrichments in parallel from the same source
- Observability: Stage-by-stage lineage and targeted reprocessing
Describe and verify
- Introspect feature addresses for each stage: Describe Collection Features
- List and inspect collections: List Collections, Get Collection
API references
- Create: Create Collection
- Update: Update Collection
- Delete: Delete Collection
Best practices
- Explicit naming: Use clear stage/version suffixes (e.g.,
products_raw_v1
) - Stable schemas: Keep downstream contracts stable; add new outputs with new versions
- Small stages: Prefer focused extractors per collection for easier upgrades
- Task monitoring: Track processing with Tasks