Create a sync configuration for automated storage ingestion.
Establishes automated synchronization between an external storage provider and a Mixpeek bucket. The sync monitors the source path and ingests files according to the specified mode and filters.
Supported Providers: google_drive, s3, snowflake, sharepoint, tigris
Built-in Robustness:
Sync Modes:
continuous: Real-time monitoring with configurable polling intervalone_time: Single bulk import, then sync stopsscheduled: Polling-based batch imports at fixed intervalsREQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.
"Bearer YOUR_API_KEY"
"Bearer YOUR_STRIPE_API_KEY"
REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'
"ns_abc123def456"
"production"
"my-namespace"
Request to create a bucket sync configuration.
Establishes automated synchronization between a storage connection and a bucket. The sync monitors the source path for changes and ingests files according to the specified mode and filters.
Supported Storage Providers: - google_drive: Google Drive and Workspace shared drives - s3: Amazon S3 and S3-compatible (MinIO, DigitalOcean Spaces, Wasabi) - snowflake: Snowflake data warehouse tables (rows become objects) - sharepoint: Microsoft SharePoint and OneDrive for Business - tigris: Tigris globally distributed object storage
Robustness Features (built-in): - Dead Letter Queue (DLQ): Failed objects tracked with 3 retries before quarantine - Idempotent ingestion: Deduplication via (bucket_id, source_provider, source_object_id) - Distributed locking: Prevents concurrent execution of same sync config - Rate limit handling: Automatic backoff on provider 429 responses - Metrics: Duration, files synced/failed, batches created, rate limit hits
Sync Modes: - continuous: Real-time monitoring with polling interval - one_time: Single bulk import then stops - scheduled: Polling-based batch imports
Requirements: - connection_id: REQUIRED, must be an existing connection - source_path: REQUIRED, path must exist in the storage provider - sync_mode: OPTIONAL, defaults to 'continuous' - All other fields are OPTIONAL with sensible defaults
REQUIRED. Storage connection identifier to sync from. Must reference an existing connection created via POST /organizations/connections. The connection defines the storage provider and credentials. Supported providers: google_drive, s3, snowflake, sharepoint, tigris.
"conn_abc123"
"conn_s3_prod"
REQUIRED. Source path within the storage provider to monitor and sync. Path format varies by provider: - s3/tigris: 'bucket-name/prefix' or 'bucket-name'. - google_drive: folder ID or path like '/Marketing/Assets'. - sharepoint: '/sites/SiteName/Shared Documents/folder'. - snowflake: 'DATABASE.SCHEMA.TABLE' or just 'TABLE' if defaults set.
"my-bucket/videos"
"0AH-Xabc123"
"PROD.PUBLIC.CUSTOMERS"
Synchronization mode determining how files are monitored and ingested. OPTIONAL. Defaults to 'continuous'. 'continuous': Actively monitors for new files and syncs immediately. 'one_time': Performs a single sync of existing files then stops. 'scheduled': Syncs on polling intervals only.
initial_only, continuous "continuous"
"one_time"
"scheduled"
OPTIONAL. Filters to control which files are synced. When omitted, all files in source_path are synced. Supported filters: - include_patterns: Glob patterns to include (e.g., ['.mp4', '.mov']). - exclude_patterns: Glob patterns to exclude (e.g., ['.tmp', '.DS_Store']). - extensions: File extensions to include (e.g., ['.mp4', '.jpg']). - min_size_bytes: Minimum file size in bytes. - max_size_bytes: Maximum file size in bytes. - modified_after: ISO datetime, only sync files modified after this time. - mime_types: List of MIME types to include (e.g., ['video/', 'image/jpeg']).
{ "extensions": [".mp4", ".mov"] }OPTIONAL. Defines how source data maps to bucket schema fields and blobs. When provided, enables structured extraction of metadata from the sync source. Keys are target bucket schema field names, values define the source extraction method.
Blob Mappings (target_type='blob'): Map files or URLs to blob fields. Use source.type='file' for the synced file itself, or source.type='column'/'metadata' for URLs.
Field Mappings (target_type='field'): Map metadata to schema fields. Source options by provider: - S3/Tigris: 'tag' (object tags), 'metadata' (x-amz-meta-*) - Snowflake: 'column' (table columns) - Google Drive: 'drive_property' (file properties) - All: 'filename_regex', 'folder_path', 'constant'
If omitted, default behavior depends on provider - typically maps file to 'content' blob.
{
"mappings": {
"category": {
"source": { "key": "category", "type": "tag" },
"target_type": "field"
},
"content": {
"blob_type": "auto",
"source": { "type": "file" },
"target_type": "blob"
}
}
}Interval in seconds between polling checks for new files. OPTIONAL. Defaults to 300 seconds (5 minutes). Must be between 30 and 900 seconds (0.5 to 15 minutes). Only applies to 'continuous' and 'scheduled' sync modes. Lower values mean faster detection but higher API usage.
30 <= x <= 90060
300
600
Number of files to process in each batch during sync. OPTIONAL. Defaults to 50 files per batch. Must be between 1 and 100. Larger batches improve throughput but require more memory. Smaller batches provide more granular progress tracking.
1 <= x <= 10010
50
100
If True, sync objects to the bucket without creating or submitting batches for collection processing. Objects are created in the bucket but no tier processing is triggered. Useful for bulk data migration or when you want to manually control when processing occurs. OPTIONAL. Defaults to False (batches are created and submitted).
false
true
Optional custom metadata to attach to the sync configuration. NOT REQUIRED. Arbitrary key-value pairs for tagging and organization. Common uses: project tags, environment labels, cost centers. Maximum 50 keys, values must be JSON-serializable.
{
"environment": "production",
"project": "video-pipeline"
}Successful Response
Bucket-scoped configuration for automated storage synchronization.
Defines how files are synced from external storage providers to a Mixpeek bucket. Includes configuration, status, metrics, and robustness control fields.
Supported Providers: google_drive, s3, snowflake, sharepoint, tigris
Built-in Robustness:
Metrics Fields:
Target bucket identifier (e.g. 'bkt_marketing_assets').
Storage connection identifier (e.g. 'conn_abc123').
Organization internal identifier (multi-tenancy scope).
Namespace identifier owning the bucket.
Source path in the external storage provider. Format varies by provider: s3/tigris='bucket/prefix', google_drive='folder_id', sharepoint='/sites/Name/Documents', snowflake='DB.SCHEMA.TABLE'.
User identifier that created the sync configuration.
Unique identifier for the sync configuration.
Optional filter rules limiting which files are synced.
Schema mapping defining how source data maps to bucket schema fields. Maps external storage attributes (tags, metadata, columns, filenames) to bucket schema fields and blob properties. When provided, enables structured extraction of metadata from the sync source. See SchemaMapping for detailed configuration options.
Sync mode controlling lifecycle (initial_only or continuous).
initial_only, continuous Polling interval in seconds (continuous mode).
30 <= x <= 900Number of files processed per sync batch.
1 <= x <= 100Whether objects should be created immediately after confirmation.
Skip files whose hashes already exist in the bucket.
If True, sync objects to the bucket without creating/submitting batches for processing.
Current lifecycle status for the sync configuration. PENDING: Not yet started. ACTIVE: Currently running/polling. SUSPENDED: Temporarily paused. COMPLETED: Initial sync completed (for initial_only mode). FAILED: Sync encountered errors.
PENDING, IN_PROGRESS, PROCESSING, COMPLETED, COMPLETED_WITH_ERRORS, FAILED, CANCELED, UNKNOWN, SKIPPED, DRAFT, ACTIVE, ARCHIVED, SUSPENDED Convenience flag used for filtering active syncs.
Cumulative count of files found in source across all runs.
x >= 0Cumulative count of successfully synced files.
x >= 0Cumulative count of failed files (sent to DLQ after 3 retries).
x >= 0Cumulative bytes transferred across all runs.
x >= 0When sync configuration was created.
Last modification timestamp.
When last successful sync completed. Used for incremental syncs.
Scheduled time for next sync (continuous/scheduled modes).
Most recent error message if sync attempts failed.
1000x >= 0Arbitrary metadata supplied by the user.
Worker ID that currently holds the lock for this sync
Timestamp when lock was acquired
Timestamp when lock expires (for stale lock recovery)
Whether sync is currently paused (user-controlled)
Reason for pause
Timestamp when paused
User who paused the sync
Hard cap on objects per sync run (prevents runaway syncs)
x >= 1Maximum objects per batch chunk
1 <= x <= 1000Number of objects per batch chunk (for concurrent processing)
1 <= x <= 1000UUID for current/last sync run
Increments on each sync execution
x >= 0List of batch IDs created by this sync
List of task IDs for batches
Total number of batches created
x >= 0Whether resuming partial runs is enabled
Last page/cursor processed (for paginated APIs like Google Drive)
Last primary key processed (for database syncs with stable ordering)
Count of objects processed in current/last run
x >= 0How often to checkpoint (in objects). Default: every 1000 objects
100 <= x <= 10000