Skip to main content
POST
/
v1
/
inference
Execute Raw Inference
curl --request POST \
  --url https://api.mixpeek.com/v1/inference \
  --header 'Content-Type: application/json' \
  --data '
{
  "provider": "<string>",
  "model": "<string>",
  "inputs": {},
  "parameters": {
    "max_tokens": 500,
    "temperature": 0.7
  }
}
'
{
  "data": "<unknown>",
  "provider": "<string>",
  "model": "<string>",
  "latency_ms": 123,
  "tokens_used": {
    "completion": 120,
    "prompt": 15,
    "total": 135
  }
}

Headers

Authorization
string

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

X-Namespace
string

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Body

application/json

Request for raw inference without retriever framework.

This endpoint provides direct access to inference services with minimal configuration. Ideal for simple LLM calls, embeddings, transcription, or vision tasks without requiring collection setup or retriever configuration.

Examples: # Chat completion { "provider": "openai", "model": "gpt-4o-mini", "inputs": {"prompts": ["What is AI?"]}, "parameters": {"temperature": 0.7, "max_tokens": 500} }

# Text embedding
{
"provider": "openai",
"model": "text-embedding-3-large",
"inputs": {"text": "machine learning"},
"parameters": {}
}

# Audio transcription
{
"provider": "openai",
"model": "whisper-1",
"inputs": {"audio_url": "https://example.com/audio.mp3"},
"parameters": {}
}

# Vision (multimodal)
{
"provider": "openai",
"model": "gpt-4o",
"inputs": {
"prompts": ["Describe this image"],
"image_url": "https://example.com/image.jpg"
},
"parameters": {"temperature": 0.5}
}
provider
string
required

Provider name: openai, google, anthropic

model
string
required

Model identifier specific to the provider

inputs
Inputs · object
required

Model-specific inputs. Chat: {prompts: [str]}, Embeddings: {text: str} or {texts: [str]}, Transcription: {audio_url: str}, Vision: {prompts: [str], image_url: str}

parameters
Parameters · object

Optional parameters for inference. Common: temperature (float), max_tokens (int), schema (dict for structured output)

Example:
{ "max_tokens": 500, "temperature": 0.7 }

Response

Successful Response

Response from raw inference.

Returns the inference results along with metadata about the request.

data
any
required

Inference results (structure varies by modality)

provider
string
required

Provider that was used

model
string
required

Model that was used

latency_ms
number
required

Total inference latency in milliseconds

tokens_used
Tokens Used · object

Token usage statistics (if available)

Example:
{
"completion": 120,
"prompt": 15,
"total": 135
}