Backfill and Segmented Backfill
  • 27 Jan 2025
  • 1 Minute to read
  • PDF

Backfill and Segmented Backfill

  • PDF

Article summary

Bobsled offers two types of backfill processes: Backfill and Segmented Backfill. These processes help ensure data consistency and completeness by re-syncing source data to destination shares under specific scenarios.


Types of Backfill

Backfill

The Backfill process transfers all historical source data to a destination share in a single synchronization. This is the default behavior for some loading patterns, such as Overwrite. Backfills are used in Append-Only and Update-and-Append scenarios to restore data consistency.

Segmented Backfill

Segmented Backfill is used for large datasets, splitting the data into smaller, manageable segments. For example, a 100TB dataset may be divided into 20 segments of 5TB each. Users can customize the segmentation settings, or Bobsled will suggest default options. Learn more about configuring segmented backfill.


When does a Backfill occur?

User-Initiated Backfill

Users can manually trigger a backfill in two ways:

  • Transferring data for the first time in a new share.

  • Selecting Backfill on Next Run or Backfill all in the Edit Transfer GUI.

Bobsled-Initiated Backfill

Bobsled automatically triggers a backfill under the following conditions:

  • Modifications to source or destination locations.

  • Changes in loading or unloading behaviors, including settings like primaryKeys, deleteFlags, or fileOptions.

  • Updates to glob filters or source view definitions.

Snowflake Bobsled-Initiated Backfill

Bobsled detects if a Secure View has been recreated by inspecting created_at field in SHOW VIEWS query results.


Key Benefits

  • Data Consistency: Ensure destination shares always reflect source data accurately.

  • Customizability: Manage large datasets effectively with Segmented Backfill.

  • Automation: Bobsled detects and initiates necessary backfills to save time and effort.


Was this article helpful?