- 19 Dec 2024
- 4 Minutes to read
- PDF
Setup guide
- Updated on 19 Dec 2024
- 4 Minutes to read
- PDF
This article will describe the steps for a Provider to configure Bobsled access to a Snowflake to be used as a source in Bobsled.
Prerequisites
To configure Bobsled access to your source, access to the
ACCOUNTADMIN
role is required.
Setup instructions
Step 1: Grant access to your Snowflake account
Bobsled supports two modes of Snowflake source setup:
Standard setup and,
Advanced setup.
In the Bobsled app:
Navigate to Data sources
Select Snowflake, the cloud and the region of your Snowflake account. Click Continue to move to the next step
Choose your setup mode: Standard or Advanced and follow the steps on the screen to run the associated script or adapt to your needs.
Option 1: Standard setup
This is the easiest way to setup a Snowflake Source. This requires running a script to grant the correct access to your Snowflake account. The script does the following:
Creates a user
Creates a role for that user to use
Grants the role the ability to:
Create integrations
Create warehouses
Creates a security integration for authentication assigned to the role
Creates a database for Bobsled to use
Creates a stored procedure to be used for granting proper access to the databases that Bobsled needs to access (Bobsled can access multiple databases within one source). When using the stored procedure, the user needs to enter databases that Bobsled should have access to. The stored procedure does the following:
Grants the role usage to all current and future schemas and select to all current and future tables and views
Turns change tracking on for all tables and views in those databases. Bobsled uses change tracking for efficient extracts.
Option 2: Advanced setup
Advanced setup can be chosen by a Snowflake account admin who wants more control over Bobsled's access and footprint within their Snowflake account. Instead of granting Bobsled the ability to create the required infrastructure, all infrastructure can be created and then access granted to Bobsled.
Bobsled provides a script—accessible via the source setup page—, available in SQL or Terraform, to run for setting up required Bobsled infrastructure or adapting per your needs. The terraform script is generated specific to your Bobsled deployment and the specific source setup.
Select advanced setup,
Use Bobsled default values, or adapt and replace them to your needs.
Run the script.
Step 2: Finish Bobsled configuration
Depending on the route chosen to setup Bobsled access, you need finish configuring the source in the Bobsled App after running the relevant scrips.
Option 1: For standard setup
After you’ve run the script:
Provide your
Snowflake Organization name
,Account name
(asorg_name.account_name
) andAccount locator
. All values must be entered UPPERCASE.Easily obtain these values in snowflake with the following script:
SELECT CURRENT_ORGANIZATION_NAME() || '.' || CURRENT_ACCOUNT_NAME() as "ORG.ACCOUNT", CURRENT_ACCOUNT() as LOCATOR;
Learn more on how to access these account identifiers ↗
Choose your Scale Snowflake Warehouse Source Computing sizes for Incremental Loading and Backfill loading. Note that in the standard setup, Bobsled sets these values by default. Learn more about the scaling strategy for your use case.
Once you’ve entered all the information, click Save.
Option 2: For advanced setup
After you’ve run the script:
Provide your:
Snowflake Organization name
,Account name
(asorg_name.account_name
) andAccount locator
. Learn more on how to access these account identifiers ↗
Enter the integration details, running the SQL command to fetch the details for the following:
Snowflake storage integration IAM User Role
Snowflake storage integration AWS External ID
And confirm the following:
Snowflake storage integration AWS Role ARN
Snowflake storage allowed locations
Once you’ve entered all the information, click Save.
Once you’ve successfully configured your Source, you can add it to a Share, choose a Destination, and a create a transfer to start sharing your data.
Warehouse compute scaling
Scaling Strategy
Bobsled aims to complete data unloads within 30 minutes. This is because Snowflake can scale unloads linearly, allowing us to provide data to our customers quickly and facilitating fast feedback loops for delivery teams.
Snowflake's performance allows it to unload 100 GB on an XS (one compute node) in approximately 10 minutes. This linear scalability results in the following performance metrics:
10 GB: Approximately 1 minute on XS
100 GB: Approximately 10 minutes on XS
500 GB: Approximately 10 minutes on Medium
1 TB: Approximately 15 minutes on Large
8 TB: Approximately 30 minutes on 2XLarge
Configuration Options
Bobsled offers support for both a backfill-focused warehouse and an incremental warehouse. This configuration enables small incremental syncs on a small warehouse while using a large warehouse for substantial initial loads and backfills.
For clients utilizing the standard setup, Bobsled will create the necessary warehouses in your Snowflake account. Collaborate with your account team to ensure the warehouses are appropriately sized.
For clients using the advanced setup, you are required to create two warehouses for Bobsled to utilize. Provide the names of each warehouse and their respective roles to your account team for correct account configuration. In the advanced setup, warehouses should be configured with multi-clustering enabled, standard scaling mode, and allowing up to 10 clusters. This configuration allows unloads to scale out as the number of tables unloaded increases.
Secure Views
When using secure views as a source, bobsled monitors the LAST_DDL
timestamp from the INFORMATION_SCHEMA
in order to determine when a view has been restated. When Bobsled determines a change to a View’s LAST_DDL
, Bobsled will mark the table for backfill on the next transfer.
Multi-tenant Filtering
Bobsled supports multi-tenant filtering, enabling you to control data access at the tenant level in a multi-tenant data environment. This feature ensures the right data gets to the right customer without having to manage custom views per customer. Learn more.