Skip to main content
Deep research workflows orchestrate multiple retriever executions, enrichment passes, and synthesis steps to answer complex questions. Mixpeek’s stage catalog—search, filter, enrich, transform, compose—gives you the primitives to build these flows without bespoke infrastructure.

Building Blocks

Stage TypeExamplesUse in Research
Searchsemantic_search, hybrid_search, late_interaction_search, web_searchGather candidate documents across modalities and the open web
Filterfilter (structured/text/LLM/custom)Narrow to relevant time ranges, entities, or sentiment
Enrichjoin (direct/retriever), taxonomyAttach structured context, e.g., taxonomy tags or related entities
Transformllm_generationSummarize, extract key facts, or generate structured notes
Composeretriever, external_api_callChain sub-retrievers or call external services (e.g., fact-check APIs)

Common Patterns

Literature Review

  1. Seed search using hybrid_search to retrieve recent papers.
  2. Structured filter by publication date and venue.
  3. Taxonomy join to classify by research area.
  4. LLM generation stage to summarize findings with citations.
  5. Store summaries alongside feature_id references for auditability.

Competitive Intelligence

  1. Use web_search + web_lookup stages to pull public announcements.
  2. Join with internal product docs via join@v1 (retriever strategy) to compare specs.
  3. Apply a custom filter to spotlight price or feature gaps.
  4. Generate a briefing memo with the llm_generation stage.

Incident Investigation

  1. Collect relevant runbooks/logs via semantic_search over internal collections.
  2. Use filter stages to isolate the incident window.
  3. Enrich with taxonomy-based tags (taxonomy@v1) for impacted systems.
  4. Summarize timeline and root cause via llm_generation, keeping citations.

Orchestrating Multi-Retriever Flows

Leverage the retriever@v1 compose stage to call sub-retrievers based on previous stage output:
{
  "stage_name": "retriever",
  "version": "v1",
  "parameters": {
    "retriever_id": "ret_internal_logs",
    "input_mappings": {
      "query_text": "{{inputs.primary_question}}",
      "time_range": "{{STAGE.filter.time_range}}"
    },
    "merge_strategy": "append"
  }
}
This pattern lets you create macro retrievers that orchestrate domain-specific sub-searches, enabling modular reuse.

Capturing Feedback

  • Record user signals with the Interactions API (click, long_view, positive_feedback, etc.).
  • Feed interactions back into rerankers or filter stages (“hide documents seen in this session”).
  • Combine interactions with analytics endpoints to optimize parameter choices (e.g., increase hybrid_search.limit if users often tap beyond top 10).

Operational Tips

  1. Persist execution IDs – each execute response includes execution_id; link it to your research session for audit trails.
  2. Monitor stage telemetrystage_statistics identifies bottlenecks (e.g., LLM stages dominating latency).
  3. Budget controls – set budget_limits on retrievers to cap time or credit consumption for exploratory workflows.
  4. Cache intermediate results – use cache_stage_names for expensive discovery steps, especially when analysts reiterate queries.
  5. Leverage tasks – schedule enrichment batches (clusters, taxonomies) ahead of time so research pipelines stay low-latency.

Suggested Architecture

Orchestration App
 ├─ Calls macro retriever (with compose stages)
 ├─ Logs execution IDs + user prompts
 ├─ Stores generated summaries & citations
 └─ Sends interactions back to Mixpeek
Behind the scenes, Mixpeek handles stage execution, caching, and lineage tracking. You focus on stitching together the right stages and presenting the synthesized output.

Next Steps