How Bobsled moves your data

Bobsled connects your sources and destinations, handling the “lower-case ETL” for you—extracting, transforming (lightly), and loading or replicating your data with minimal setup.

Whether you’re delivering from file storage, moving between cloud data warehouses, or sending data back to file storage, Bobsled applies patterns designed for accuracy, performance, and ease of maintenance.

Bobsled data movement principles apply whether you’re moving the data directly from your source to a destination, or to create Sledhouse Tables.

File Storage to Cloud Data Warehouse

When delivering from a file storage source (e.g., Amazon S3) to a cloud data warehouse (e.g., Snowflake), Bobsled turns files into tables using Bobsled data types as a bridge.

Patterns:

Append Only: Load all new records from files into the destination table. (Best for immutable datasets)
Update and Append: Merge updates into the table, ensuring only the latest record per key is present.
Recordset Overwrite: Replace entire related sets of records together.
Overwrite: Replace the entire table with new files.
True Up: Combine daily incremental updates with periodic full overwrites.
Change Data Capture (CDC): Apply inserts, updates, and deletes from a change stream.

Typical syncs: hourly, daily, or aligned with upstream file drops. CDC may run continuously. For more in-depth information, visit the File Storage to Cloud Data Warehouse transfer guide.

File Storage to File Storage

When replicating between file storage systems (e.g., S3 to Azure Blob Storage), Bobsled mirrors the structure of the source bucket. Depending on configuration:

Mirror: Keep the destination exactly in sync (add/remove files).
Append-Only: Only add new files to the destination.

Typical syncs: daily or hourly for static datasets, more frequent for high-change environments. For more in-depth information, visit the File Storage to File Storage transfer guide.

Cloud Data Warehouse to Cloud Data Warehouse

When moving tables between warehouses (e.g., BigQuery to Databricks), Bobsled mirrors source tables into the destination with a focus on correctness, using Bobsled data types as a bridge.

Patterns:

Full-Table Replication: Replace the destination table with the full source table each sync.
Incremental Replication: Append Only: Add new rows based on a created_at column.
Incremental Replication: Update and Append: Add/update rows based on a last_modified_at column and unique key.
Change Tracking (Snowflake only): Use Snowflake change streams to replicate inserts, updates, and deletes.

Typical syncs:

Full-table: daily or weekly.
Incremental: hourly or near-real-time.
Change tracking: continuous or high-frequency.

For more in-depth information, visit the Cloud Data Warehouse to Cloud Data Warehouse transfer guide.

Cloud Data Warehouse to File Storage

When delivering from a warehouse to file storage, Bobsled extracts tables into structured folder paths.

Patterns:

Full-Table Replication: Export the entire table as compressed files.
Incremental Replication: Append Only: Write only new rows.
Incremental Replication: Update and Append: Write updated/new rows, ensuring the latest state per key.
Change Tracking (Snowflake only): Capture inserts, updates, and deletes for accurate downstream files.

Typical syncs:

Batch exports: daily or weekly.
Incremental: hourly or aligned with downstream consumption windows.
Change tracking: near-real-time.

For more in-depth information, visit the Cloud Data Warehouse to File Storage transfer guide.

Choosing the right pattern

Scenario	Recommended Pattern	Typical Sync
Immutable event logs	Append Only	Hourly or daily
Frequently updated records	Update and Append	Hourly
Multi-row record units	Recordset Overwrite	Hourly or daily
Periodic full refresh	Overwrite / Full-Table	Weekly or monthly
Incrementals + periodic full	True Up	Daily with weekly full
Real-time change streams	CDC / Change Tracking	Continuous

TIP:
• If unsure which pattern is right for your data, contact your sales representative or book a demo ↗
• Bobsled recommends a backfill if configuration changes could affect how data is loaded or queried, but you’re in control—except for Local Copies, which always sync with their parent Sledhouse Table.

How does Bobsled keep your data in sync?

Each transfer or Sledhouse Table has a sync schedule that determines when the chosen replication/loading pattern runs. You can:

Run on a schedule: e.g., every hour, daily, weekly.
Trigger manually: start a one-off run whenever needed.
Trigger on demand via API: perfect for integration with upstream workflows.
Combine with event-based automation: for near-real-time pipelines, e.g., CDC or change tracking.

Some patterns (like CDC or change tracking) may run continuously or at high frequency to capture changes as soon as possible.

Need to backfill data for your customers?

Bobsled allows you to run a backfill to repopulate historical records—and informs you if one is advised. This is useful if:

You’ve added new fields and want old data filled in
You need to realign the destination with the full source history
A sync resulted in an error that otherwise was successful