Overview
ML model options for each method
The lists below show all the models currently supported in the SDK for various tasks. Each model is designed to handle a specific type of data or task, such as embedding, reading, describing, or transcribing.
If you plan on storing the raw embeddings yourself see the embedding storage section.
We add models regularly, so if we’re missing any reach out.
Embedding Models
Embedding models convert data into numerical vectors, enabling efficient similarity searches and machine learning tasks.
Name | Modality | Dimensions | Description |
---|---|---|---|
multimodal-v1 | Text, Image, Video | 1408 | Most general purpose model, but slower and can be less precise |
clip-v1 | Text, Image | 512 | A versatile model for text and image embeddings |
vuse-generic-v1 | Text, Video | 768 | Specialized for text and video embeddings |
splade-v3 | Text | N/A | Full-text model for text-only embeddings |
Read Models
Read models read the text on the visual asset.
Name | Modality | Description |
---|---|---|
video-descriptor-v1 | Video | Extracts key information and metadata from video content |
Describe Models
Describe models generate human-readable descriptions or summaries of input data.
Name | Modality | Description |
---|---|---|
video-descriptor-v1 | Video | Generates detailed descriptions of video content |
Transcribe Models
Transcribe models convert spoken language into written text.
Name | Modality | Description |
---|---|---|
polyglot-v1 | Audio | Transcribes speech from multiple languages |
Was this page helpful?