Fine-Tuning
To fine-tune any of the models (extract
, generate
, and embed
), follow these steps:
Send Annotated Data
Send your annotated data to an S3 bucket and connect it via the
Connections service. This will return a connection_id
.
Ensure the data is well-organized and labeled according to the model’s requirements
(we’ll provide specs)
Initiate Fine-Tuning
Use the Mixpeek API to start a fine-tuning job. Specify the base
model_id
, the S3 bucket path, and specs
that tell the fine-tuner how to run:
mixpeek.models.tune(
model_id="jinaai/jina-embeddings-v2-base-en",
annotation={
"connection_id": "conn_123",
"specs": "specs.json"
}
)
This will return a new model_id that you can use in your pipeline, for example: model_1askdh2390
, which you’ll then use in your methods like:
mixpeek.embed.text(model_id="model_1askdh2390", input="hello")
Version Control
It’s recommended to store the version_id of each model state as it is fine-tuned.
If you append a model_version
in the metadata of the pipeline it’ll automatically be added to your database so you can do query-time filtering:
mixpeek.pipelines.create(
...
destination={
...
"metadata": {
"model_version": "model_1askdh2390"
}
}
)
Was this page helpful?