Skip to main content
Image extractor pipeline showing SigLIP processing and embedding generation
The image extractor generates dense vector embeddings from images using Google’s SigLIP model (768D). Optimized for visual similarity search, product matching, and cross-modal search with text queries. Fast (~50-100ms per image) and cost-effective.
View extractor details at api.mixpeek.com/v1/collections/features/extractors/image_extractor_v1 or fetch programmatically with GET /v1/collections/features/extractors/{feature_extractor_id}.

When to Use

Use CaseDescription
Image searchFind visually similar images in large collections
Visual similarityMatch products, artwork, or content by appearance
Content discoveryRecommend similar visual content
Cross-modal searchFind images using text queries (via SigLIP text encoder)
E-commerceProduct image search and visual recommendations
Stock photo searchMedia library search by visual content

When NOT to Use

ScenarioRecommended Alternative
Face recognitionface_identity_extractor
Video contentmultimodal_extractor
Text-heavy images requiring OCRmultimodal_extractor with OCR enabled
Audio contentaudio_extractor

Input Schema

FieldTypeRequiredDescription
imagestringYesURL or S3 path to image file. Formats: JPEG, PNG, WebP, BMP. Any resolution (resized to 224x224 internally).
{
  "image": "s3://my-bucket/products/laptop-pro.jpg"
}
Input Examples:
TypeExample
Product images3://my-bucket/products/laptop-pro.jpg
Stock photohttps://cdn.example.com/photos/sunset-beach.jpg
Catalog images3://catalog/items/SKU-12345.png
Supported Formats: JPEG, PNG, WebP, BMP, GIF (static) Recommended Resolution: 224x224 or larger (automatically resized) Max File Size: 10MB recommended

Output Schema

FieldTypeDescription
image_extractor_v1_embeddingfloat[768]SigLIP image embedding, L2 normalized
processing_time_msnumberProcessing time in milliseconds
thumbnail_urlstringS3 URL of the thumbnail image (if generated)
{
  "image_extractor_v1_embedding": [0.023, -0.041, 0.018, ...],
  "processing_time_ms": 85.2,
  "thumbnail_url": "s3://mixpeek-storage/ns_123/thumbnails/thumb_001.jpg"
}

Parameters

The image extractor uses sensible defaults and requires no additional parameters for basic usage.
ParameterTypeDefaultDescription
None required--All parameters use optimized defaults

Configuration Examples

{
  "feature_extractor": {
    "feature_extractor_name": "image_extractor",
    "version": "v1",
    "input_mappings": {
      "image": "payload.image_url"
    },
    "field_passthrough": [
      { "source_path": "metadata.product_id" }
    ],
    "parameters": {}
  }
}

Performance & Costs

MetricValue
Processing speed~50-100ms per image
Batch processingUp to 16 images per batch
GPU accelerationSupported for faster inference
Cost10 credits per image

Vector Index

PropertyValue
Index nameimage_extractor_v1_embedding
Dimensions768
TypeDense
Distance metricCosine
Datatypefloat32
Inference modelgoogle_siglip_base_v1
The SigLIP embeddings are compatible with SigLIP text embeddings, enabling cross-modal search where you can:
  • Find images using natural language text queries
  • Match images to text descriptions
  • Build hybrid search combining visual and textual similarity

Comparison with Other Image Extractors

Featureimage_extractormultimodal_extractor
Dimensions7681408
ModelSigLIPVertex AI Multimodal
ProcessingImage onlyVideo, Image, Text, GIF
Cross-modalSigLIP text encoderVertex text encoder
Best ForFast image searchUnified multimodal search
Cost10 credits/imageHigher (includes more features)

Limitations

  • Image only: Does not process video, audio, or text content
  • No OCR: Cannot extract text from images; use multimodal_extractor with OCR
  • No face recognition: For face matching, use face_identity_extractor
  • Single image: Processes one image at a time (batch via API)
  • Resolution: Input is resized to 224x224 internally