Create Sync Configuration

curl --request POST \
  --url https://api.mixpeek.com/v1/buckets/{bucket_id}/syncs \
  --header 'Content-Type: application/json' \
  --data '
{
  "connection_id": "<string>",
  "source_path": "<string>",
  "sync_mode": "continuous",
  "file_filters": {
    "extensions": [
      ".mp4",
      ".mov"
    ]
  },
  "schema_mapping": {
    "mappings": {
      "category": {
        "source": {
          "key": "category",
          "type": "tag"
        },
        "target_type": "field"
      },
      "content": {
        "blob_type": "auto",
        "source": {
          "type": "file"
        },
        "target_type": "blob"
      }
    }
  },
  "polling_interval_seconds": 300,
  "batch_size": 50,
  "skip_batch_submission": false,
  "metadata": {
    "environment": "production",
    "project": "video-pipeline"
  }
}
'

{
  "bucket_id": "<string>",
  "connection_id": "<string>",
  "internal_id": "<string>",
  "namespace_id": "<string>",
  "source_path": "<string>",
  "created_by_user_id": "<string>",
  "sync_config_id": "<string>",
  "file_filters": {
    "include_patterns": [
      "<string>"
    ],
    "exclude_patterns": [
      "<string>"
    ],
    "min_size_bytes": 1,
    "max_size_bytes": 1,
    "modified_after": "2023-11-07T05:31:56Z",
    "modified_before": "2023-11-07T05:31:56Z",
    "mime_types": [
      "<string>"
    ]
  },
  "schema_mapping": {
    "mappings": {}
  },
  "sync_mode": "continuous",
  "polling_interval_seconds": 300,
  "batch_size": 50,
  "create_object_on_confirm": true,
  "skip_duplicates": true,
  "skip_batch_submission": false,
  "status": "PENDING",
  "is_active": true,
  "total_files_discovered": 0,
  "total_files_synced": 0,
  "total_files_failed": 0,
  "total_bytes_synced": 0,
  "created_at": "2023-11-07T05:31:56Z",
  "updated_at": "2023-11-07T05:31:56Z",
  "last_sync_at": "2023-11-07T05:31:56Z",
  "next_sync_at": "2023-11-07T05:31:56Z",
  "last_error": "<string>",
  "consecutive_failures": 0,
  "metadata": {},
  "locked_by_worker_id": "<string>",
  "locked_at": "2023-11-07T05:31:56Z",
  "lock_expires_at": "2023-11-07T05:31:56Z",
  "paused": false,
  "pause_reason": "<string>",
  "paused_at": "2023-11-07T05:31:56Z",
  "paused_by_user_id": "<string>",
  "max_objects_per_run": 100000,
  "max_batch_chunk_size": 1000,
  "batch_chunk_size": 100,
  "current_sync_run_id": "<string>",
  "sync_run_counter": 0,
  "batch_ids": [
    "<string>"
  ],
  "task_ids": [
    "<string>"
  ],
  "batches_created": 0,
  "resume_enabled": true,
  "resume_cursor": "<string>",
  "resume_last_primary_key": "<string>",
  "resume_objects_processed": 0,
  "resume_checkpoint_frequency": 1000
}

Syncs

Create Sync Configuration

Create a sync configuration for automated storage ingestion.

Establishes automated synchronization between an external storage provider and a Mixpeek bucket. The sync monitors the source path and ingests files according to the specified mode and filters.

Supported Providers: google_drive, s3, snowflake, sharepoint, tigris

Built-in Robustness:

Dead Letter Queue (DLQ): Failed objects tracked with 3 retries
Idempotent ingestion: Deduplication prevents duplicate objects
Distributed locking: Prevents concurrent sync execution
Rate limit handling: Automatic backoff on 429 responses
Metrics: Duration, files synced/failed, batches created

Sync Modes:

continuous: Real-time monitoring with configurable polling interval
one_time: Single bulk import, then sync stops
scheduled: Polling-based batch imports at fixed intervals

POST

buckets

{bucket_id}

syncs

Create Sync Configuration

curl --request POST \
  --url https://api.mixpeek.com/v1/buckets/{bucket_id}/syncs \
  --header 'Content-Type: application/json' \
  --data '
{
  "connection_id": "<string>",
  "source_path": "<string>",
  "sync_mode": "continuous",
  "file_filters": {
    "extensions": [
      ".mp4",
      ".mov"
    ]
  },
  "schema_mapping": {
    "mappings": {
      "category": {
        "source": {
          "key": "category",
          "type": "tag"
        },
        "target_type": "field"
      },
      "content": {
        "blob_type": "auto",
        "source": {
          "type": "file"
        },
        "target_type": "blob"
      }
    }
  },
  "polling_interval_seconds": 300,
  "batch_size": 50,
  "skip_batch_submission": false,
  "metadata": {
    "environment": "production",
    "project": "video-pipeline"
  }
}
'

{
  "bucket_id": "<string>",
  "connection_id": "<string>",
  "internal_id": "<string>",
  "namespace_id": "<string>",
  "source_path": "<string>",
  "created_by_user_id": "<string>",
  "sync_config_id": "<string>",
  "file_filters": {
    "include_patterns": [
      "<string>"
    ],
    "exclude_patterns": [
      "<string>"
    ],
    "min_size_bytes": 1,
    "max_size_bytes": 1,
    "modified_after": "2023-11-07T05:31:56Z",
    "modified_before": "2023-11-07T05:31:56Z",
    "mime_types": [
      "<string>"
    ]
  },
  "schema_mapping": {
    "mappings": {}
  },
  "sync_mode": "continuous",
  "polling_interval_seconds": 300,
  "batch_size": 50,
  "create_object_on_confirm": true,
  "skip_duplicates": true,
  "skip_batch_submission": false,
  "status": "PENDING",
  "is_active": true,
  "total_files_discovered": 0,
  "total_files_synced": 0,
  "total_files_failed": 0,
  "total_bytes_synced": 0,
  "created_at": "2023-11-07T05:31:56Z",
  "updated_at": "2023-11-07T05:31:56Z",
  "last_sync_at": "2023-11-07T05:31:56Z",
  "next_sync_at": "2023-11-07T05:31:56Z",
  "last_error": "<string>",
  "consecutive_failures": 0,
  "metadata": {},
  "locked_by_worker_id": "<string>",
  "locked_at": "2023-11-07T05:31:56Z",
  "lock_expires_at": "2023-11-07T05:31:56Z",
  "paused": false,
  "pause_reason": "<string>",
  "paused_at": "2023-11-07T05:31:56Z",
  "paused_by_user_id": "<string>",
  "max_objects_per_run": 100000,
  "max_batch_chunk_size": 1000,
  "batch_chunk_size": 100,
  "current_sync_run_id": "<string>",
  "sync_run_counter": 0,
  "batch_ids": [
    "<string>"
  ],
  "task_ids": [
    "<string>"
  ],
  "batches_created": 0,
  "resume_enabled": true,
  "resume_cursor": "<string>",
  "resume_last_primary_key": "<string>",
  "resume_objects_processed": 0,
  "resume_checkpoint_frequency": 1000
}

Headers

Authorization

string

REQUIRED: Bearer token authentication using your API key. Format: 'Bearer sk_xxxxxxxxxxxxx'. You can create API keys in the Mixpeek dashboard under Organization Settings.

Examples:

"Bearer YOUR_API_KEY"

"Bearer YOUR_STRIPE_API_KEY"

X-Namespace

string

REQUIRED: Namespace identifier for scoping this request. All resources (collections, buckets, taxonomies, etc.) are scoped to a namespace. You can provide either the namespace name or namespace ID. Format: ns_xxxxxxxxxxxxx (ID) or a custom name like 'my-namespace'

Examples:

"ns_abc123def456"

"production"

"my-namespace"

Path Parameters

bucket_id

string

required

Body

application/json

Request to create a bucket sync configuration.

Establishes automated synchronization between a storage connection and a bucket. The sync monitors the source path for changes and ingests files according to the specified mode and filters.

Supported Storage Providers: - google_drive: Google Drive and Workspace shared drives - s3: Amazon S3 and S3-compatible (MinIO, DigitalOcean Spaces, Wasabi) - snowflake: Snowflake data warehouse tables (rows become objects) - sharepoint: Microsoft SharePoint and OneDrive for Business - tigris: Tigris globally distributed object storage

Robustness Features (built-in): - Dead Letter Queue (DLQ): Failed objects tracked with 3 retries before quarantine - Idempotent ingestion: Deduplication via (bucket_id, source_provider, source_object_id) - Distributed locking: Prevents concurrent execution of same sync config - Rate limit handling: Automatic backoff on provider 429 responses - Metrics: Duration, files synced/failed, batches created, rate limit hits

Sync Modes: - continuous: Real-time monitoring with polling interval - one_time: Single bulk import then stops - scheduled: Polling-based batch imports

Requirements: - connection_id: REQUIRED, must be an existing connection - source_path: REQUIRED, path must exist in the storage provider - sync_mode: OPTIONAL, defaults to 'continuous' - All other fields are OPTIONAL with sensible defaults

connection_id

string

required

REQUIRED. Storage connection identifier to sync from. Must reference an existing connection created via POST /organizations/connections. The connection defines the storage provider and credentials. Supported providers: google_drive, s3, snowflake, sharepoint, tigris.

Examples:

"conn_abc123"

"conn_s3_prod"

source_path

string

required

REQUIRED. Source path within the storage provider to monitor and sync. Path format varies by provider: - s3/tigris: 'bucket-name/prefix' or 'bucket-name'. - google_drive: folder ID or path like '/Marketing/Assets'. - sharepoint: '/sites/SiteName/Shared Documents/folder'. - snowflake: 'DATABASE.SCHEMA.TABLE' or just 'TABLE' if defaults set.

Examples:

"my-bucket/videos"

"0AH-Xabc123"

"PROD.PUBLIC.CUSTOMERS"

sync_mode

enum<string>

default:continuous

Synchronization mode determining how files are monitored and ingested. OPTIONAL. Defaults to 'continuous'. 'continuous': Actively monitors for new files and syncs immediately. 'one_time': Performs a single sync of existing files then stops. 'scheduled': Syncs on polling intervals only.

Available options:

initial_only,

continuous

Examples:

"continuous"

"one_time"

"scheduled"

file_filters

File Filters · object

OPTIONAL. Filters to control which files are synced. When omitted, all files in source_path are synced. Supported filters: - include_patterns: Glob patterns to include (e.g., ['.mp4', '.mov']). - exclude_patterns: Glob patterns to exclude (e.g., ['.tmp', '.DS_Store']). - extensions: File extensions to include (e.g., ['.mp4', '.jpg']). - min_size_bytes: Minimum file size in bytes. - max_size_bytes: Maximum file size in bytes. - modified_after: ISO datetime, only sync files modified after this time. - mime_types: List of MIME types to include (e.g., ['video/', 'image/jpeg']).

Example:

{ "extensions": [".mp4", ".mov"] }

schema_mapping

SchemaMapping · object

OPTIONAL. Defines how source data maps to bucket schema fields and blobs. When provided, enables structured extraction of metadata from the sync source. Keys are target bucket schema field names, values define the source extraction method.

Blob Mappings (target_type='blob'): Map files or URLs to blob fields. Use source.type='file' for the synced file itself, or source.type='column'/'metadata' for URLs.

Field Mappings (target_type='field'): Map metadata to schema fields. Source options by provider: - S3/Tigris: 'tag' (object tags), 'metadata' (x-amz-meta-*) - Snowflake: 'column' (table columns) - Google Drive: 'drive_property' (file properties) - All: 'filename_regex', 'folder_path', 'constant'

If omitted, default behavior depends on provider - typically maps file to 'content' blob.

Show child attributes

Example:

{
  "mappings": {
    "category": {
      "source": { "key": "category", "type": "tag" },
      "target_type": "field"
    },
    "content": {
      "blob_type": "auto",
      "source": { "type": "file" },
      "target_type": "blob"
    }
  }
}

polling_interval_seconds

integer

default:300

Interval in seconds between polling checks for new files. OPTIONAL. Defaults to 300 seconds (5 minutes). Must be between 30 and 900 seconds (0.5 to 15 minutes). Only applies to 'continuous' and 'scheduled' sync modes. Lower values mean faster detection but higher API usage.

Required range: 30 <= x <= 900

Examples:

60

300

600

batch_size

integer

default:50

Number of files to process in each batch during sync. OPTIONAL. Defaults to 50 files per batch. Must be between 1 and 100. Larger batches improve throughput but require more memory. Smaller batches provide more granular progress tracking.

Required range: 1 <= x <= 100

Examples:

10

50

100

skip_batch_submission

boolean

default:false

If True, sync objects to the bucket without creating or submitting batches for collection processing. Objects are created in the bucket but no tier processing is triggered. Useful for bulk data migration or when you want to manually control when processing occurs. OPTIONAL. Defaults to False (batches are created and submitted).

Examples:

false

true

metadata

Metadata · object

Optional custom metadata to attach to the sync configuration. NOT REQUIRED. Arbitrary key-value pairs for tagging and organization. Common uses: project tags, environment labels, cost centers. Maximum 50 keys, values must be JSON-serializable.

Example:

{
  "environment": "production",
  "project": "video-pipeline"
}

Response

Successful Response

Bucket-scoped configuration for automated storage synchronization.

Defines how files are synced from external storage providers to a Mixpeek bucket. Includes configuration, status, metrics, and robustness control fields.

Supported Providers: google_drive, s3, snowflake, sharepoint, tigris

Built-in Robustness:

Distributed locking (locked_by_worker_id, lock_expires_at)
Pause/resume control (paused, pause_reason, paused_at)
Safety limits (max_objects_per_run, batch_chunk_size)
Resume checkpointing (resume_cursor, resume_objects_processed)
Batch tracking (batch_ids, task_ids, batches_created)

Metrics Fields:

total_files_discovered: Files found in source
total_files_synced: Successfully synced files
total_files_failed: Files that failed (check DLQ)
total_bytes_synced: Total data transferred
consecutive_failures: Failure count for auto-suspend

bucket_id

string

required

Target bucket identifier (e.g. 'bkt_marketing_assets').

connection_id

string

required

Storage connection identifier (e.g. 'conn_abc123').

internal_id

string

required

Organization internal identifier (multi-tenancy scope).

namespace_id

string

required

Namespace identifier owning the bucket.

source_path

string

required

Source path in the external storage provider. Format varies by provider: s3/tigris='bucket/prefix', google_drive='folder_id', sharepoint='/sites/Name/Documents', snowflake='DB.SCHEMA.TABLE'.

created_by_user_id

string

required

User identifier that created the sync configuration.

sync_config_id

string

Unique identifier for the sync configuration.

file_filters

FileFilters · object

Optional filter rules limiting which files are synced.

Show child attributes

schema_mapping

SchemaMapping · object

Schema mapping defining how source data maps to bucket schema fields. Maps external storage attributes (tags, metadata, columns, filenames) to bucket schema fields and blob properties. When provided, enables structured extraction of metadata from the sync source. See SchemaMapping for detailed configuration options.

Show child attributes

sync_mode

enum<string>

default:continuous

Sync mode controlling lifecycle (initial_only or continuous).

Available options:

initial_only,

continuous

polling_interval_seconds

integer

default:300

Polling interval in seconds (continuous mode).

Required range: 30 <= x <= 900

batch_size

integer

default:50

Number of files processed per sync batch.

Required range: 1 <= x <= 100

create_object_on_confirm

boolean

default:true

Whether objects should be created immediately after confirmation.

skip_duplicates

boolean

default:true

Skip files whose hashes already exist in the bucket.

skip_batch_submission

boolean

default:false

If True, sync objects to the bucket without creating/submitting batches for processing.

status

enum<string>

default:PENDING

Current lifecycle status for the sync configuration. PENDING: Not yet started. ACTIVE: Currently running/polling. SUSPENDED: Temporarily paused. COMPLETED: Initial sync completed (for initial_only mode). FAILED: Sync encountered errors.

Available options:

PENDING,

IN_PROGRESS,

PROCESSING,

COMPLETED,

COMPLETED_WITH_ERRORS,

FAILED,

CANCELED,

UNKNOWN,

SKIPPED,

DRAFT,

ACTIVE,

ARCHIVED,

SUSPENDED

is_active

boolean

default:true

Convenience flag used for filtering active syncs.

total_files_discovered

integer

default:0

Cumulative count of files found in source across all runs.

Required range: x >= 0

total_files_synced

integer

default:0

Cumulative count of successfully synced files.

Required range: x >= 0

total_files_failed

integer

default:0

Cumulative count of failed files (sent to DLQ after 3 retries).

Required range: x >= 0

total_bytes_synced

integer

default:0

Cumulative bytes transferred across all runs.

Required range: x >= 0

created_at

string<date-time>

When sync configuration was created.

updated_at

string<date-time>

Last modification timestamp.

last_sync_at

string<date-time> | null

When last successful sync completed. Used for incremental syncs.

next_sync_at

string<date-time> | null

Scheduled time for next sync (continuous/scheduled modes).

last_error

string | null

Most recent error message if sync attempts failed.

Maximum string length: 1000

consecutive_failures

integer

default:0

Required range: x >= 0

metadata

Metadata · object

Arbitrary metadata supplied by the user.

locked_by_worker_id

string | null

Worker ID that currently holds the lock for this sync

locked_at

string<date-time> | null

Timestamp when lock was acquired

lock_expires_at

string<date-time> | null

Timestamp when lock expires (for stale lock recovery)

paused

boolean

default:false

Whether sync is currently paused (user-controlled)

pause_reason

string | null

Reason for pause

paused_at

string<date-time> | null

Timestamp when paused

paused_by_user_id

string | null

User who paused the sync

max_objects_per_run

integer

default:100000

Hard cap on objects per sync run (prevents runaway syncs)

Required range: x >= 1

max_batch_chunk_size

integer

default:1000

Maximum objects per batch chunk

Required range: 1 <= x <= 1000

batch_chunk_size

integer

default:100

Number of objects per batch chunk (for concurrent processing)

Required range: 1 <= x <= 1000

current_sync_run_id

string | null

UUID for current/last sync run

sync_run_counter

integer

default:0

Increments on each sync execution

Required range: x >= 0

batch_ids

string[]

List of batch IDs created by this sync

task_ids

string[]

List of task IDs for batches

batches_created

integer

default:0

Total number of batches created

Required range: x >= 0

resume_enabled

boolean

default:true

Whether resuming partial runs is enabled

resume_cursor

string | null

Last page/cursor processed (for paginated APIs like Google Drive)

resume_last_primary_key

string | null

Last primary key processed (for database syncs with stable ordering)

resume_objects_processed

integer

default:0

Count of objects processed in current/last run

Required range: x >= 0

resume_checkpoint_frequency

integer

default:1000

How often to checkpoint (in objects). Default: every 1000 objects

Required range: 100 <= x <= 10000

List Sync Configurations Get Sync Configuration

⌘I

Namespaces

Buckets

Feature Extractors

Collections

Retrievers

Taxonomies

Clusters

Templates

Manifest

Resource Search

Inference

Tasks

Webhooks

Create Sync Configuration

Headers

Path Parameters

Body

Response