When using the mixpeek methods, you provide the model in each request regardless of method: extract, embed, and generate.

For instance mixpeek.extract(), mixpeek.embed() or mixpeek.generate()

mixpeek.embed("jinaai/jina-embeddings-v2-base-en")

The list below are all the embedding models currently supported in the SDK. For embed, use the dimensions to create your DB index.

We add models regularly, so if we’re missing any reach out.

Models

Embedding

NameModalityDimensionsURL
sentence-transformers/all-MiniLM-L6-v2Text384Sentence Transformers
nomic-ai/nomic-embed-text-v1Text768Nomic AI
jinaai/jina-embeddings-v2-base-enText768Jina AI
google-bert/bert-base-multilingual-uncasedText768Google BERT
mixedbread-ai/mxbai-embed-large-v1Text1024MixedBread AI
openai/clip-vit-base-patch32Text, Image512OpenAI Clip
mixpeek/vuse-generic-v1Text, Video768Mixpeek

Generation

NameModalityURL
openai/gpt-3.5-turboTextGPT-3.5 Turbo
openai/gpt-4-turbo-previewTextGPT-4 Turbo Preview
openai/gpt-4-turboTextGPT-4 Turbo

Extraction

Work in progress

Fine-Tuning

Available to Enterprise customers only

To fine-tune any of the models (extract, generate, and embed), follow these steps:

1

Send Annotated Data

Send your annotated data to an S3 bucket and connect it via the Connections service. This will return a connection_id. Ensure the data is well-organized and labeled according to the model’s requirements (we’ll provide specs)

2

Initiate Fine-Tuning

Use the Mixpeek API to start a fine-tuning job. Specify the base model_id, the S3 bucket path, and specs that tell the fine-tuner how to run:

mixpeek.models.tune(
  model_id="jinaai/jina-embeddings-v2-base-en",
  annotation={
    "connection_id": "conn_123",
    "specs": "specs.json"
  }
)

This will return a new model_id that you can use in your pipeline, for example: model_1askdh2390, which you’ll then use in your methods like:

  mixpeek.embed.text(model_id="model_1askdh2390", input="hello")
3

Version Control

It’s recommended to store the version_id of each model state as it is fine-tuned.

If you append a model_version in the metadata of the pipeline it’ll automatically be added to your database so you can do query-time filtering:

mixpeek.pipelines.create(
  ...
  destination={
    ...
    "metadata": {
      "model_version": "model_1askdh2390"
    }
  }
)