Backfill
  • 12 May 2025
  • 1 Minute to read
  • PDF

Backfill

  • PDF

Article summary

Backfilling helps ensure data consistency and completeness by re-syncing source data to destination shares under specific scenarios.

The Backfill process transfers all historical source data to a destination share in a single synchronization and resets the schema on the destination table when loading a data warehouse.


When does a Backfill occur?

Users can manually trigger a backfill in two ways:

  • Transferring data for the first time to a new share.

  • Selecting Backfill on Next Run or Backfill all while editing a transfer in the Bobsled Application.

When is a Backfill recommended?

Bobsled will alert in the application when a backfill is recommended to ensure data consistency, under the following conditions:

  • Modifications to source or destination locations.

  • Changes in loading or unloading behaviors, including settings like primaryKeys, deleteFlags, or fileOptions.

  • Updates to glob filters or source view definitions.

NOTE:
Not enabling backfill when recommended may result in inconsistent data or a failed transfer. If a transfer fails as a result, you can manually trigger a backfill later from the UI.


Snowflake backfill behaviour on view change (if enabled)

NOTE:
Bobsled detects if a Secure View has been recreated by inspecting ‘created_at’ field in ‘SHOW VIEWS’ query results.


Key Benefits

  • Data Consistency: Ensure destination shares always reflect source data accurately.

  • Automation: Bobsled detects and initiates necessary backfills to save time and effort.

Considerations

  • Sources with TTL policies: When backfilling from sources that maintain a TTL policy, any data in the share older than the TTL policy will not be restored.

    • Example: The share has been active for 30 days. The source bucket has a 10-day TTL policy. Once a backfill is initiated, this share will only have data from the most recent 10 days available in the source bucket.


Was this article helpful?