A Simple Example

Say you upload an image: dress.png to your S3 bucket for your apparel ecommerce store and you want to extract the description and keywords, and create an embedding of the dress itself.

You’d create a Mixpeek pipeline that combines ML models and gets invoked whenever there’s a new object in your bucket.

Your first pipeline

Here’s an example pipeline that extracts a description, keywords, and creates an embedding.

def function(event, context):
  mixpeek = Mixpeek("API_KEY")

  # extract description
  description = mixpeek.extract.text(model_id="model_id_1")
  # extract keywords
  keywords = mixpeek.extract.text(model_id="model_id_2")
  # create an embedding
  embedding = mixpeek.embed.image(model_id="model_id_3")

  return {
    "file_url": event.object_url,
    "description": description,
    "keywords": keywords,
    "embedding": embedding
}

Create pipeline documentation

Once we create this pipeline then connect our S3 bucket and finally MongoDB collection then enable it.

Your first connection

Every new account comes with a preconfigured S3 and MongoDB connection so you can get started right away. Alternatively, you bring your own.

default connections

Create connection documentation

Setup pipelines

Two options for pipeline creation

Previously, we would have created an AWS IAM role, which opens up a listener on your S3 bucket apparel, every new object gets sent through the pipeline-as-code we defined above and then sent into our MongoDB collection: apparel_items (which was also instantiated previously).

That’s really it! Mixpeek is designed to be “set and forget”, never think about processing your S3 bucket again.

Sample output

Here’s an example output from the S3 object: dress.png that gets sent into your MongoDB collection:

{
  "object_url": "s3://dress.png",
  "description": "Elegant summer dress",
  "keywords": ["summer", "dress", "casual"],
  "embedding": [0, 1, 2, 3],
  "metadata": {
    "pipeline_version": "v1"
  }
}

Use the Methods directly

You can also use the extract, embed and generate methods outside of a pipeline.

from mixpeek import Mixpeek

mixpeek = Mixpeek("API_KEY")

output = mixpeek.embed.text(
  input="lorem ipsum",
  model_id="mixedbread-ai/mxbai-embed-large-v1"
)

To use the API, you need to register an API key and an engineer will contact you.

What does it enable?

With fresh vectors, metadata, and extracted content from your apparel images, you can design hyper-targeted marketing campaigns and improve product discovery on your ecommerce platform:

  • Personalized Recommendations: Suggest products based on user preferences and past interactions.
  • Visual Search: Allow users to search for products using images instead of text.
  • Trend Analysis: Identify and capitalize on emerging fashion trends by analyzing frequently occurring keywords and styles.

All without having to think about data prep again. You can even modify your pipeline, and the new version will be appended to the metadata.pipeline_version key, so you can filter your pipeline code changes against the output data.