Stages
LLM Generation
Generate content, summaries, or insights using large language models
The LLM Generation retriever stage uses large language models to generate new content based on retrieved documents or custom prompts.
Overview
LLM Generation leverages powerful language models to generate content, summarize documents, answer questions, or perform other text-generation tasks as part of a retrieval pipeline. This stage can enhance search results with AI-generated insights, explanations, or transformations of the retrieved content.
Required Inputs
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
prompt | string | Yes | - | The prompt template or instruction for the LLM |
documents | array | No | [] | Array of document IDs or content to include in context |
model | string | No | ”mixpeek/llm-v1” | The LLM model to use for generation |
max_tokens | integer | No | 1024 | Maximum number of tokens to generate |
temperature | float | No | 0.7 | Temperature for generation (0.0-1.0) |
Configurations
Generation Modes
Mode | Description | Use Case |
---|---|---|
standalone | Generate content based only on the prompt | Creative content, initial responses |
document_context | Use retrieved documents as context for generation | Question answering, summarization |
rag | Retrieval-Augmented Generation with dynamically retrieved content | Knowledge-intensive tasks |
agent | Run as an agent with tool-use capabilities | Complex reasoning, multi-step tasks |
Prompt Templates
The system supports various prompt formats and structures:
Template Type | Description |
---|---|
simple | Direct text prompt without special formatting |
chat | JSON array of messages with role and content |
jinja2 | Jinja2 template with variables for document content |
handlebars | Handlebars template with document variables |
Configuration Examples
Simple Generation
RAG Configuration
Model Parameters
Parameter | Type | Default | Description |
---|---|---|---|
temperature | float | 0.7 | Controls randomness (0.0-1.0) |
top_p | float | 0.95 | Nucleus sampling parameter |
top_k | integer | 50 | Limits vocabulary for next token selection |
repetition_penalty | float | 1.0 | Penalizes repeated tokens |
max_tokens | integer | 1024 | Maximum length of generated text |
stop_sequences | array | [] | Sequences that stop generation when encountered |
Processing Flow
Output Schema
Was this page helpful?