Databricks

Prev Next

Databricks is one of many delivery destinations that Bobsled supports. When delivering data from cloud object storage, Bobsled will turn the selected folders into tables in Databricks. Utilizing Databricks Delta Sharing ↗, Bobsled creates a Databricks-to-Databricks Delta Share in which the configured Databricks workspace(s) are granted access to create a read-only catalog from the share to query within their workspace.


Bobsled-managed Databricks

To learn how to configure a Databricks destination in Bobsled, please visit Bobsled-managed Databricks setup guide.

Authorization

Bobsled requires the Databricks sharing identifier to grant your account access to the Delta share. A Metastore Admin (or another user with import create catalog and use provider) in the Databricks workspace that will consume the data must accept the Delta Share to make it available within the workspace. To learn more about the Databricks sharing identifier used within Bobsled please visit Account Access Identifiers in Databricks.


Bobsled supports advanced settings for data engineers who want to optimize specific table delivery options in Databricks

Clustering

Bobsled supports the setting of the zOrder ↗ of a table in Databricks, resulting in optimized tables for expected query patterns. This can be set via the Bobsled Application or API ↗

Unsupported data types

Databricks clustering is only supported for primitive column types. Bobsled enforces these limitations. Follow the Bobsled Data Type documentation to ascertain which Databricks Data types aren’t supported:

  • BINARY

  • MAP

  • STRUCT

  • ARRAY

Complex or nested fields must be flattened or transformed using SQL before they can be used as cluster keys.

NOTE:
Bobsled validates cluster key selections during setup and prevents use of unsupported types via UI or API.

TIP:
If you are interested in using clustering to deliver optimized tables to your consumers but need assistance with the setup, please reach out to your account team.


Datatype override

Bobsled allows you to override a column's data type in your source schema with a different data type in the destination table.

TIP:
If you wish to leverage data type overriding, please reach out to your account team.


Schema migration support

When new columns are added to tables or files, Bobsled efficiently handles schema migrations by adding new columns to existing tables without disrupting deliveries.

  • When new columns are introduced, they're seamlessly integrated, and any missing data in these columns is defaulted to null values.

    • This approach ensures that data loading continues smoothly, even with schema changes, preventing load failures and maintaining data integrity.

  • Our schema migration strategy is designed for flexibility and reliability during data structure evolution.

  • When columns aren't present in new files, the value for missing columns is set to null.


Consuming a data transfer

Once you’ve configured your destination in a share, granted access to a consumer, and transferred data, learn how to consume a data transfer via Native Databricks Sharing or Open Sharing.