Skip to main content
Mixpeek provides two ways to run inference: built-in models served on Ray Serve, and custom models that you upload and deploy yourself. This page covers both approaches, with a focus on the custom model workflow for Enterprise users.

Built-in Models

Mixpeek ships with a curated set of models optimized for common tasks:
CategoryExamplesPrimary Use
Embeddingsmultilingual-e5-large-instruct, gte-modernbert-base, clip_vit_l_14Semantic search, multimodal similarity
Sparsesplade_v1Hybrid and lexical search
Multi-Vectorcolbertv2Late interaction retrieval
Rerankersbge-reranker-v2-m3, cross-encoderReordering search results
Generationgpt-4, claude-3-opus, gemini-proSummaries, transformations
Audiowhisper_large_v3_turbo, pyannote-segmentationTranscription, diarization
Call /v1/feature-extractors and /v1/retrievers/stages to discover supported models programmatically.

Custom Models (Enterprise)

Custom models require an Enterprise tier subscription. Contact sales to upgrade.
Custom models let you bring your own trained weights—fine-tuned embeddings, domain-specific classifiers, or proprietary architectures—and run them on Mixpeek’s infrastructure with zero-copy sharing across workers.

How It Works

  1. Upload your model archive (.tar.gz) containing weights
  2. Deploy to the Ray object store for fast access
  3. Use in custom plugins via the model loader API
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Upload    │────▶│   Deploy    │────▶│    Use      │
│  (S3 store) │     │ (Ray cache) │     │ (Inference) │
└─────────────┘     └─────────────┘     └─────────────┘

Supported Formats

FormatExtensionDescription
pytorch.pt, .pthPyTorch state_dict or TorchScript
safetensors.safetensorsSafeTensors format (recommended)
onnx.onnxONNX Runtime format
huggingfacedirectoryHuggingFace model directory

Quickstart

1. Create a Model Archive

Package your model weights into a .tar.gz archive:
import torch
import tarfile
import io

# Train or load your model
model = torch.nn.Linear(768, 256)

# Save weights
buffer = io.BytesIO()
torch.save(model.state_dict(), buffer)
buffer.seek(0)

# Create archive
with tarfile.open("my_model.tar.gz", "w:gz") as tar:
    info = tarfile.TarInfo(name="model.pt")
    info.size = len(buffer.getvalue())
    tar.addfile(info, buffer)

2. Upload the Model

curl -X POST "$MIXPEEK_API_URL/v1/namespaces/$NAMESPACE_ID/models" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -F "file=@my_model.tar.gz" \
  -F "name=my-embedding-model" \
  -F "version=1.0.0" \
  -F "model_format=pytorch" \
  -F "framework=pytorch" \
  -F "task_type=embedding" \
  -F "num_gpus=0" \
  -F "memory_gb=4.0"
Response:
{
  "success": true,
  "model_id": "my-embedding-model_1_0_0",
  "deployment_status": "pending",
  "endpoint": "/models/ns_abc123/my-embedding-model_1_0_0",
  "model_archive_url": "s3://mixpeek/..."
}

3. Deploy to Ray Object Store

Pre-load your model into the distributed cache for fast inference:
curl -X POST "$MIXPEEK_API_URL/v1/namespaces/$NAMESPACE_ID/models/my-embedding-model_1_0_0/deploy" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY"
Response:
{
  "success": true,
  "model_id": "my-embedding-model_1_0_0",
  "namespace_id": "ns_abc123",
  "deployment_status": "deployed",
  "cached": true,
  "message": "Model my-embedding-model_1_0_0 loaded into Ray object store"
}

4. Use in Custom Plugins

Load your model in a custom plugin with zero-copy access:
from engine.models.loader import load_namespace_model
import torch

class MyCustomProcessor:
    def __init__(self):
        # Load pre-uploaded weights (cached in Ray object store)
        weights = load_namespace_model("my-embedding-model_1_0_0")

        # Initialize model architecture
        self.model = torch.nn.Linear(768, 256)
        self.model.load_state_dict(weights)
        self.model.eval()

    def process(self, text_embedding):
        with torch.no_grad():
            return self.model(text_embedding)

Examples

Upload a HuggingFace Model

# Package a HuggingFace model directory
tar -czvf my-bert.tar.gz ./my-bert-model/

# Upload
curl -X POST "$MIXPEEK_API_URL/v1/namespaces/$NAMESPACE_ID/models" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -F "[email protected]" \
  -F "name=my-fine-tuned-bert" \
  -F "version=2.0.0" \
  -F "model_format=huggingface" \
  -F "framework=sentence-transformers" \
  -F "task_type=embedding" \
  -F "num_gpus=1" \
  -F "memory_gb=8.0"

Upload an ONNX Model

curl -X POST "$MIXPEEK_API_URL/v1/namespaces/$NAMESPACE_ID/models" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -F "[email protected]" \
  -F "name=product-classifier" \
  -F "version=1.0.0" \
  -F "model_format=onnx" \
  -F "task_type=classification" \
  -F "num_gpus=0" \
  -F "memory_gb=2.0"

List All Models

curl "$MIXPEEK_API_URL/v1/namespaces/$NAMESPACE_ID/models" \
  -H "Authorization: Bearer $MIXPEEK_API_KEY"
{
  "success": true,
  "models": [
    {
      "model_id": "my-embedding-model_1_0_0",
      "name": "my-embedding-model",
      "version": "1.0.0",
      "model_format": "pytorch",
      "deployed": true,
      "created_at": "2025-01-10T12:00:00Z"
    }
  ],
  "total": 1
}

Python SDK Example

from mixpeek import Mixpeek

client = Mixpeek(api_key="sk_...")

# Upload model
with open("my_model.tar.gz", "rb") as f:
    result = client.models.upload(
        namespace_id="ns_abc123",
        file=f,
        name="my-reranker",
        version="1.0.0",
        model_format="pytorch",
        task_type="reranking",
    )

# Deploy model
client.models.deploy(
    namespace_id="ns_abc123",
    model_id=result.model_id,
)

# List models
models = client.models.list(namespace_id="ns_abc123")

Model Versioning

Models are versioned independently, allowing safe rollouts:
my-embedding-model_1_0_0  (production)
my-embedding-model_1_1_0  (staging)
my-embedding-model_2_0_0  (development)
Recommended rollout process:
  1. Upload new version alongside existing
  2. Deploy to Ray object store
  3. Update staging plugins to use new version
  4. Monitor performance via Analytics API
  5. Gradually shift production traffic
  6. Delete old versions when validated

Resource Requirements

When uploading, specify resource requirements for optimal scheduling:
ParameterDescriptionDefault
num_cpusCPU cores required1.0
num_gpusGPU devices required0
memory_gbMemory allocation in GB4.0
# GPU model with high memory
-F "num_gpus=1" \
-F "memory_gb=16.0"

# CPU-only lightweight model
-F "num_cpus=0.5" \
-F "num_gpus=0" \
-F "memory_gb=2.0"

Limits & Quotas

LimitValue
Max models per namespace50
Max archive size10 GB
Supported formatspytorch, safetensors, onnx, huggingface
Required tierEnterprise


Troubleshooting

”Custom models require Enterprise tier”

Your organization must be on the Enterprise plan to use custom models. Contact sales to upgrade.

Model deployment fails

  1. Verify the archive format is valid .tar.gz
  2. Check that weights match the declared model_format
  3. Ensure sufficient memory is allocated
  4. Check engine health via /v1/health

Model not found in plugin

  1. Verify the model was deployed (not just uploaded)
  2. Check deployed: true in model details
  3. Ensure namespace_id matches in plugin configuration

Slow inference

  1. Pre-deploy models before inference to warm the cache
  2. Check cached: true in deployment response
  3. Consider increasing num_gpus for large models

Custom models unlock the full power of Mixpeek’s distributed inference infrastructure with your own trained weights. Upload once, deploy to the Ray object store, and access with zero-copy sharing across all your custom plugins.