Setup guide
  • 19 Dec 2024
  • 4 Minutes to read
  • PDF

Setup guide

  • PDF

Article summary

This article will describe the steps for a Provider to configure Bobsled access to a Snowflake to be used as a source in Bobsled.


Prerequisites 

  • To configure Bobsled access to your source, access to the ACCOUNTADMIN role is required.


Setup instructions

Step 1: Grant access to your Snowflake account

Bobsled supports two modes of Snowflake source setup:

  • Standard setup and,

  • Advanced setup.

In the Bobsled app:

  1. Navigate to Data sources

  2. Select Snowflake, the cloud and the region of your Snowflake account. Click Continue to move to the next step

  3. Choose your setup mode: Standard or Advanced and follow the steps on the screen to run the associated script or adapt to your needs.

Option 1: Standard setup

This is the easiest way to setup a Snowflake Source. This requires running a script to grant the correct access to your Snowflake account. The script does the following:

  • Creates a user

  • Creates a role for that user to use

  • Grants the role the ability to:

    • Create integrations

    • Create warehouses

  • Creates a security integration for authentication assigned to the role

  • Creates a database for Bobsled to use

  • Creates a stored procedure to be used for granting proper access to the databases that Bobsled needs to access (Bobsled can access multiple databases within one source). When using the stored procedure, the user needs to enter databases that Bobsled should have access to. The stored procedure does the following:

    • Grants the role usage to all current and future schemas and select to all current and future tables and views

    • Turns change tracking on for all tables and views in those databases. Bobsled uses change tracking for efficient extracts.

Option 2: Advanced setup

Advanced setup can be chosen by a Snowflake account admin who wants more control over Bobsled's access and footprint within their Snowflake account. Instead of granting Bobsled the ability to create the required infrastructure, all infrastructure can be created and then access granted to Bobsled.

Bobsled provides a script—accessible via the source setup page—, available in SQL or Terraform, to run for setting up required Bobsled infrastructure or adapting per your needs. The terraform script is generated specific to your Bobsled deployment and the specific source setup.

  1. Select advanced setup,

  2. Use Bobsled default values, or adapt and replace them to your needs.

  3. Run the script.


Step 2: Finish Bobsled configuration

Depending on the route chosen to setup Bobsled access, you need finish configuring the source in the Bobsled App after running the relevant scrips.

Option 1: For standard setup

After you’ve run the script:

  1. Provide your Snowflake Organization name, Account name (as org_name.account_name) and Account locator. All values must be entered UPPERCASE.

    1. Easily obtain these values in snowflake with the following script:
      SELECT CURRENT_ORGANIZATION_NAME() || '.' || CURRENT_ACCOUNT_NAME() as "ORG.ACCOUNT",  CURRENT_ACCOUNT() as LOCATOR;

    2. Learn more on how to access these account identifiers ↗

  2. Choose your Scale Snowflake Warehouse Source Computing sizes for Incremental Loading and Backfill loading. Note that in the standard setup, Bobsled sets these values by default. Learn more about the scaling strategy for your use case.

  3. Once you’ve entered all the information, click Save.

Option 2: For advanced setup

After you’ve run the script:

  1. Provide your:

    • Snowflake Organization name, Account name (as org_name.account_name) and Account locator. Learn more on how to access these account identifiers ↗

  2. Enter the integration details, running the SQL command to fetch the details for the following:

    • Snowflake storage integration IAM User Role

    • Snowflake storage integration AWS External ID

    • And confirm the following:

      • Snowflake storage integration AWS Role ARN

      • Snowflake storage allowed locations

  3. Once you’ve entered all the information, click Save.


Once you’ve successfully configured your Source, you can add it to a Share, choose a Destination, and a create a transfer to start sharing your data.


Warehouse compute scaling

Scaling Strategy

Bobsled aims to complete data unloads within 30 minutes. This is because Snowflake can scale unloads linearly, allowing us to provide data to our customers quickly and facilitating fast feedback loops for delivery teams.

Snowflake's performance allows it to unload 100 GB on an XS (one compute node) in approximately 10 minutes. This linear scalability results in the following performance metrics:

  • 10 GB:  Approximately 1 minute on XS

  • 100 GB: Approximately 10 minutes on XS

  • 500 GB: Approximately 10 minutes on Medium

  • 1 TB: Approximately 15 minutes on Large

  • 8 TB: Approximately 30 minutes on 2XLarge

Configuration Options

Bobsled offers support for both a backfill-focused warehouse and an incremental warehouse. This configuration enables small incremental syncs on a small warehouse while using a large warehouse for substantial initial loads and backfills.

For clients utilizing the standard setup, Bobsled will create the necessary warehouses in your Snowflake account. Collaborate with your account team to ensure the warehouses are appropriately sized.

For clients using the advanced setup, you are required to create two warehouses for Bobsled to utilize. Provide the names of each warehouse and their respective roles to your account team for correct account configuration. In the advanced setup, warehouses should be configured with multi-clustering enabled, standard scaling mode, and allowing up to 10 clusters. This configuration allows unloads to scale out as the number of tables unloaded increases.


Secure Views

When using secure views as a source, bobsled monitors the LAST_DDL timestamp from the INFORMATION_SCHEMA in order to determine when a view has been restated.  When Bobsled determines a change to a View’s LAST_DDL, Bobsled will mark the table for backfill on the next transfer.


Multi-tenant Filtering

Bobsled supports multi-tenant filtering, enabling you to control data access at the tenant level in a multi-tenant data environment. This feature ensures the right data gets to the right customer without having to manage custom views per customer. Learn more.


Was this article helpful?

What's Next