The LLM Generation retriever stage uses large language models to generate new content based on retrieved documents or custom prompts.

Overview

LLM Generation leverages powerful language models to generate content, summarize documents, answer questions, or perform other text-generation tasks as part of a retrieval pipeline. This stage can enhance search results with AI-generated insights, explanations, or transformations of the retrieved content.

Required Inputs

ParameterTypeRequiredDefaultDescription
promptstringYes-The prompt template or instruction for the LLM
documentsarrayNo[]Array of document IDs or content to include in context
modelstringNo”mixpeek/llm-v1”The LLM model to use for generation
max_tokensintegerNo1024Maximum number of tokens to generate
temperaturefloatNo0.7Temperature for generation (0.0-1.0)

Configurations

Generation Modes

ModeDescriptionUse Case
standaloneGenerate content based only on the promptCreative content, initial responses
document_contextUse retrieved documents as context for generationQuestion answering, summarization
ragRetrieval-Augmented Generation with dynamically retrieved contentKnowledge-intensive tasks
agentRun as an agent with tool-use capabilitiesComplex reasoning, multi-step tasks

Prompt Templates

The system supports various prompt formats and structures:

Template TypeDescription
simpleDirect text prompt without special formatting
chatJSON array of messages with role and content
jinja2Jinja2 template with variables for document content
handlebarsHandlebars template with document variables

Configuration Examples

Simple Generation
{
  "mode": "standalone",
  "model": "mixpeek/llm-v1",
  "prompt": "Generate a summary of vector database technology.",
  "max_tokens": 500,
  "temperature": 0.3
}
RAG Configuration
{
  "mode": "rag",
  "model": "mixpeek/llm-v2",
  "prompt_template": "Answer the following question based on the provided documents:\n\nQuestion: {{query}}\n\nDocuments:\n{{#each documents}}{{content}}\n\n{{/each}}",
  "template_format": "handlebars",
  "retriever_config": {
    "type": "knn_search",
    "k": 5,
    "feature_store_id": "fs_embeddings_123"
  },
  "max_tokens": 1024,
  "temperature": 0.2
}

Model Parameters

ParameterTypeDefaultDescription
temperaturefloat0.7Controls randomness (0.0-1.0)
top_pfloat0.95Nucleus sampling parameter
top_kinteger50Limits vocabulary for next token selection
repetition_penaltyfloat1.0Penalizes repeated tokens
max_tokensinteger1024Maximum length of generated text
stop_sequencesarray[]Sequences that stop generation when encountered

Processing Flow

Output Schema

{
  "generation": {
    "text": "Vector databases are specialized database systems designed to efficiently store and query high-dimensional vectors...",
    "model": "mixpeek/llm-v1",
    "tokens_generated": 487,
    "finish_reason": "length"
  },
  "context": {
    "documents": [
      {
        "document_id": "doc_abc123",
        "collection_id": "col_xyz789",
        "relevance_score": 0.92
      },
      {
        "document_id": "doc_def456",
        "collection_id": "col_xyz789",
        "relevance_score": 0.88
      }
    ],
    "prompt_tokens": 218
  },
  "metadata": {
    "processing_time_ms": 856.2,
    "model_parameters": {
      "temperature": 0.3,
      "max_tokens": 500,
      "top_p": 0.95
    }
  }
}