Buckets are the foundation of Mixpeek’s storage architecture. They serve as containers for raw objects and their associated files before processing. They are your entry point for all multimodal processing, search and analysis.

Overview

Buckets accept objects which are composed of blobs (collections of related files or json-types) before they’re processed by collections into documents.

1

Define Bucket Schema

First, create a bucket with a schema that defines what types of files your objects will contain. This schema validation ensures data consistency and proper processing.

2

Select Blobs

Gather the files (blobs) you want to include in your object. These can be images, videos, documents, audio files, or JSON data that are related to each other.

3

Upload Blobs as Object

Bundle your selected blobs into an object and upload it to your bucket. You can include object metadata and organize with prefixes for better structure.

Storage Containers

Securely store raw objects and their associated files in logically grouped containers

Object Organization

Organize objects based on use case, content type, or processing requirements

Key Concepts

Creating a Bucket

Python
from mixpeek import Mixpeek

mp = Mixpeek(api_key="YOUR_API_KEY")

# Create a bucket
bucket = mp.buckets.create(
    namespace="ns_abc123",
    bucket_name="product-images",
    description="Product images for e-commerce catalog",
    schema={
      "type": "object",
      "properties": {
        "video_1": {
          "type": "video"
        },
        "pdf_1": {
          "type": "pdf"
        }
      }
    }
)

bucket_id = bucket["bucket_id"]
print(f"Created bucket: {bucket_id}")

Objects

Once you’ve created a bucket, you can add objects to it. Objects are collections of related blobs that represent a single entity in your domain.

Create an object in your bucket. Each object can contain multiple related files.

# Create an object with multiple files
mp.objects.create(
  bucket=bucket_id,
  prefix="/files",
  # the metadata is passed down each document
  metadata={
    "name": "red-sneaker-product" 
  }
  blobs=[
      {
          "url": "https://example.com/images/red-sneaker-front.jpg",
          "mimetype": "image/jpeg"
      },
      {
          "url": "https://example.com/images/red-sneaker-side.jpg",
          "mimetype": "image/jpeg"
      },
      {
          "url": "https://example.com/data/red-sneaker-specs.txt",
          "mimetype": "text/plain"
      }
  ]
)

Best Practices

Logical Grouping

Group objects in buckets based on logical collections, such as product categories, content types, or processing requirements

Naming Conventions

Use consistent naming patterns for buckets to make them easily identifiable and manageable

Metadata Usage

Metadata from your objects is passed down to all associated documents in your destination collections.

Resource Optimization

Monitor bucket usage and distribute objects across multiple buckets if needed for performance optimization

Common Use Cases

E-commerce Products

Store product images, videos, specs, and descriptions for catalog processing

Media Assets

Organize images, videos, and audio files for media libraries

Documentation

Manage PDFs, technical documents, and related assets

Blobs

Blobs represent individual files within Objects. While Objects group related files together, Blobs are the actual raw file content or JSON types that gets processed by feature extractors.

Once your blobs are processing into objects, they maintain the prefix structure you assigned on upload. They can be treated as a standard file system.

Supported File Types

FormatMIME TypeMax SizeNotes
JPEGimage/jpeg50MBRGB color space
PNGimage/png50MBTransparency supported
WebPimage/webp50MBModern format
GIFimage/gif50MBAnimated GIFs supported

Best Practices for Blob Management

File Organization

Group related blobs into objects for better organization and processing efficiency

Metadata Usage

Add descriptive metadata to blobs to improve searchability and organization

Size Optimization

Compress large files when possible to improve upload and processing speed

Format Selection

Use recommended formats for each content type to ensure optimal processing

Limitations

Bucket Limitations

  • Storage Quotas: Each namespace has limits on total bucket storage capacity based on your plan
  • Bucket Naming: Bucket names must be unique within a namespace and follow naming conventions
  • Rate Limits: API operations on buckets are subject to rate limiting based on your account tier
  • Schema Changes: Bucket schemas cannot be modified after creation; a new bucket must be created

Object Limitations

  • Size Restrictions: Objects have a maximum combined blob size of 10GB per object
  • Metadata Size: Object metadata is limited to 100KB in size
  • Immutability: Object structure cannot be modified after creation (blobs cannot be added or removed)
  • Prefix Depth: Object prefixes are limited to a maximum of 20 levels of nesting

Blob Limitations

  • Size Constraints: Maximum blob size varies by file type (see Supported File Types above)
  • Quantity Limits: Maximum number of blobs per object: 10
  • Format Restrictions: Supported MIME types are limited to those listed above
  • Content Immutability: Blob content is immutable once uploaded