Snowflake

Prev Next

Snowflake is one of many delivery destinations that Bobsled supports. When delivering data from cloud object storage, Bobsled will turn the selected folders into tables in Snowflake. Utilizing Snowflake Secure Data Sharing, Bobsled creates a Snowflake data share in which the configured Snowflake account(s) are granted access to create a read-only database from the share to query within their account.


Bobsled-managed Snowflake

To learn how to configure a Snowflake destination in Bobsled, please visit Bobsled-managed Snowflake setup guide.

Authorization

Bobsled requires Snowflake organization name and Snowflake account name to grant your account access to the Snowflake share. An Account Admin (or another user with import share privilege) on the Snowflake account that will consume the data must accept the data share to make it available within the account.

To learn more about the Snowflake access identifier used within Bobsled please visit: Account Access Identifiers in Snowflake.


Bobsled supports various advanced settings to further control how tables are delivered in Snowflake.

Clustering keys

Bobsled supports the setting of cluster keys ↗ in Snowflake, resulting in optimized tables for expected query patterns. This can be set via the Bobsled Application or API ↗

Unsupported data types

Bobsled restricts usage of certain data types that are not compatible with Snowflake’s clustering engine. Follow the Bobsled Data Type documentation to ascertain which Snowflake Data types aren’t supported:

  • ARRAY

  • OBJECT

  • VARIANT

These types are considered semi-structured and are not allowed as clustering keys.

NOTE:
Bobsled validates cluster key selections during setup and prevents use of unsupported types via UI or API.

Additional notes

To lower cardinality and support more efficient clustering maintenance, transformations are applied to certain column types following Snowflake best practices. The column types and their corresponding transformations are as follows:

  • TIMESTAMP columns are converted to DATE using TO_DATE().

  • GEOGRAPHY columns are converted to strings via ST_GEOHASH().

TIP:
If you are interested in using clustering to deliver optimized tables to your consumers but need assistance with the setup, please reach out to your account team.


Datatype override

Bobsled allows you to override a column's data type in your source schema with a different data type in the destination table. This functionality is primarily used for certain geospatial data types, which are available in Snowflake but not specifiable in Parquet.

TIP:
If you wish to leverage data type overriding, please reach out to your account team.


Search optimization

Bobsled supports the setting of search optimization ↗ in Snowflake. This service aims to significantly improve the performance of certain types of queries on tables for your customers.

TIP:
If are interested in using search optimization, please reach out to your account team.


Schema migration support

When new columns are added to tables or files, Bobsled efficiently handles schema migrations by adding new columns to existing tables without disrupting deliveries.

  • When new columns are introduced, they're seamlessly integrated, and any missing data in these columns is defaulted to null values.

    • This approach ensures that data loading continues smoothly, even with schema changes, preventing load failures and maintaining data integrity.

  • Our schema migration strategy is designed for flexibility and reliability during data structure evolution.

  • When columns aren't present in new files, the value for missing columns is set to null.


Accessing data

Once your data has been shared to the destination, learn how to access it in Snowflake