Execute raw inference with provider and model parameters.
This endpoint provides direct access to inference services without the retriever framework overhead. Ideal for simple LLM calls, embeddings, transcription, or vision tasks.
{
"provider": "openai",
"model": "gpt-4o-mini",
"inputs": {"prompts": ["What is AI?"]},
"parameters": {"temperature": 0.7, "max_tokens": 500}
}
{
"provider": "openai",
"model": "text-embedding-3-large",
"inputs": {"text": "machine learning"},
"parameters": {}
}
{
"provider": "google",
"model": "multimodalembedding",
"inputs": {"text": "machine learning"},
"parameters": {}
}
{
"provider": "google",
"model": "multimodalembedding",
"inputs": {"image_url": "https://example.com/image.jpg"},
"parameters": {}
}
{
"provider": "google",
"model": "multimodalembedding",
"inputs": {"image_base64": "<base64-encoded-image>"},
"parameters": {}
}
{
"provider": "google",
"model": "multimodalembedding",
"inputs": {"video_url": "https://example.com/video.mp4"},
"parameters": {}
}
{
"provider": "google",
"model": "multimodalembedding",
"inputs": {"video_base64": "<base64-encoded-video>"},
"parameters": {}
}
{
"provider": "openai",
"model": "whisper-1",
"inputs": {"audio_url": "https://example.com/audio.mp3"},
"parameters": {}
}
{
"provider": "openai",
"model": "gpt-4o",
"inputs": {
"prompts": ["Describe this image"],
"image_url": "https://example.com/image.jpg"
},
"parameters": {"temperature": 0.5}
}
Args: request: FastAPI request object (populated by middleware) payload: Raw inference request
Returns: Inference response with results and metadata
Raises: 400 Bad Request: Invalid provider, model, or inputs 401 Unauthorized: Missing or invalid API key 429 Too Many Requests: Rate limit exceeded 500 Internal Server Error: Inference execution failed
REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.
"Bearer YOUR_API_KEY"
"Bearer YOUR_STRIPE_API_KEY"
REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'
"ns_abc123def456"
"production"
"my-namespace"
Request for raw inference without retriever framework.
This endpoint provides direct access to inference services with minimal configuration. Ideal for simple LLM calls, embeddings, transcription, or vision tasks without requiring collection setup or retriever configuration.
Examples: # Chat completion { "provider": "openai", "model": "gpt-4o-mini", "inputs": {"prompts": ["What is AI?"]}, "parameters": {"temperature": 0.7, "max_tokens": 500} }
# Text embedding
{
"provider": "openai",
"model": "text-embedding-3-large",
"inputs": {"text": "machine learning"},
"parameters": {}
}
# Audio transcription
{
"provider": "openai",
"model": "whisper-1",
"inputs": {"audio_url": "https://example.com/audio.mp3"},
"parameters": {}
}
# Vision (multimodal)
{
"provider": "openai",
"model": "gpt-4o",
"inputs": {
"prompts": ["Describe this image"],
"image_url": "https://example.com/image.jpg"
},
"parameters": {"temperature": 0.5}
}Provider name: openai, google, anthropic
"openai"
"google"
"anthropic"
Model identifier specific to the provider
"gpt-4o-mini"
"gemini-1.5-flash"
"claude-3-5-sonnet"
"text-embedding-3-large"
"whisper-1"
Model-specific inputs. Chat: {prompts: [str]}, Embeddings: {text: str} or {texts: [str]}, Transcription: {audio_url: str}, Vision: {prompts: [str], image_url: str}
{
"prompts": ["What is the capital of France?"]
}{ "text": "machine learning" }{
"audio_url": "https://example.com/audio.mp3"
}Optional parameters for inference. Common: temperature (float), max_tokens (int), schema (dict for structured output)
{ "max_tokens": 500, "temperature": 0.7 }Successful Response
Response from raw inference.
Returns the inference results along with metadata about the request.
Inference results (structure varies by modality)
Provider that was used
Model that was used
Total inference latency in milliseconds
Token usage statistics (if available)
{
"completion": 120,
"prompt": 15,
"total": 135
}