Setting backfill segmentation (Public preview)
  • 22 Nov 2024
  • 2 Minutes to read
  • PDF

Setting backfill segmentation (Public preview)

  • PDF

Article summary

When creating a data transfer, Bobsled allows you to split your backfill operation into manageable segments. This is especially useful when dealing with large amounts of data (over 20 TB).

Segmented backfills are designed to divide the workload into “segments” or partitions that will be moved (and loaded) to the destination. This reduces the pressure on the underlying infrastructure, and any small transient issue doesn’t impact the long-running process.

Bobsled recommends configuring the source as the actual table rather than a view for optimal performance with large tables. Direct table access enables Bobsled better to analyze data layout patterns—such as partitioning—, resulting in more efficient segmentation and metadata operations.

NOTE:
When configuring a partitioned table ↗ for Backfill Segmentation in BigQuery, ensure that your BigQuery partition key serves as both the segmentation key and the replication pattern cursor.

BACKFILL SEGMENTATION PUBLIC PREVIEW (September 2024)
Feature is in public preview. It is suitable for certain production workloads but may not be appropriate for all use cases. For guidance, contact your account representative.


Prerequisites

  • A Share must be created.

  • To successfully create a data transfer in a share, you must have at least one Data source preconfigured in Bobsled.

  • If you’re sharing data with a Bobsled-managed destination, you only need to pick where you want your data to be shared. Sharing to an externally managed destination may require more details before you are ready to start creating a transfer. Check our supported Cloud Data Warehouse Destinations for more information.

NOTE:
What you can see and do will differ based on your role and permissions.


Setup instructions

  1. Within a share (with a source and cloud data warehouse destination), click the create transfer button

  2. Choose your source objects to share and click continue

  3. In the table transfer configuration step choose your loading pattern

NOTE:
Setting Backfill segmentation is not available for the Full-table replication.

  1. Locate and select the cog icon

  2. Click the Backfill segmentation toggle on the new screen to enable this functionality.

    • Choose the Segment column ,

      • The segment key is the key Bobsled will use to divide your data. This is a crucial step, so ensure the key is chosen wisely.

    • Choose the total Number of segments (1–100)

      • Consider the table size you are moving to determine how many segments to use. Target segments smaller than 20 TB.

  1. Click done to close. Repeat this step for any other table you require to be segmented.

  2. Once done, click continue, review, and click start transfer.

NOTE:
In case of an error during the transfer, Bobsled will resume from the last point of failure, ensuring that the progress made during previous segments is retained.


Was this article helpful?