Available to Enterprise customers only

To fine-tune any of the models (extract, generate, and embed), follow these steps:

1

Send Annotated Data

Send your annotated data to an S3 bucket and connect it via the Connections service. This will return a connection_id. Ensure the data is well-organized and labeled according to the model’s requirements (we’ll provide specs)

2

Initiate Fine-Tuning

Use the Mixpeek API to start a fine-tuning job. Specify the base model_id, the S3 bucket path, and specs that tell the fine-tuner how to run:

mixpeek.models.tune(
  model_id="jinaai/jina-embeddings-v2-base-en",
  annotation={
    "connection_id": "conn_123",
    "specs": "specs.json"
  }
)

This will return a new model_id that you can use in your pipeline, for example: model_1askdh2390, which you’ll then use in your methods like:

  mixpeek.embed.text(model_id="model_1askdh2390", input="hello")
3

Version Control

It’s recommended to store the version_id of each model state as it is fine-tuned.

If you append a model_version in the metadata of the pipeline it’ll automatically be added to your database so you can do query-time filtering:

mixpeek.pipelines.create(
  ...
  destination={
    ...
    "metadata": {
      "model_version": "model_1askdh2390"
    }
  }
)