Data Sharing

Prev Next

Data Sharing

What is data sharing?

Data sharing is the practice of granting access to data products to other parties, systems, or organizations while maintaining appropriate governance, security, and access controls. With data sharing, a provider entitles a consumer to a secure, read-only view of a file or database that can be analyzed using the consumer's own compute.

Every cloud data and storage platform uses its own proprietary "sharing protocol" to share data between accounts, but each follows the same basic principles:

Principles of Data Sharing

  • In-place sharing: In a data share, the consumer uses their own compute to query data stored and managed by the provider

  • Credential-free permissioning: Data sharing uses public identifiers to permission access rather than requiring parties to exchange private credentials

  • Native access: Consumers access the data "natively" within their analytics platform rather than receiving the data through traditional delivery methods like APIs or file transfers

What are the advantages of data sharing?

Data sharing offers several advantages over legacy forms of data delivery, such as APIs, database connectors, and SFTP. The combination of credential-free permissioning, in-place sharing, and native access allows data providers to deliver a seamless experience to users while maintaining significant control over their products.

Advantages of Data Sharing

  • Faster time-to-insight: Data sharing eliminates the need for consumers to integrate data into their platforms, significantly reducing the time-to-insight for data products

  • Improved governance: Providers retain control over the data product, which means they can easily monitor and manage access to consumers

  • Better analytics: Sharing protocols provide telemetry on consumer usage that can be used to optimize and manage data products

  • Reduced costs: Eliminates duplicate storage and minimizes data transfer costs

How does data sharing work in Bobsled?

Bobsled uses the data sharing protocol of the destination platform to securely permission data with consumers. Since these protocols are "native" to the destination platform, consumers can access the data instantly within their workspace.

Each platform has its own "protocol" for permissioning data. Every protocol requires a provider to use a "public identifier" to authorize particular users, but the identifier varies by platform.

Bobsled Destinations, Sharing Protocols & Required Identifiers

Destination

Sharing Protocol Used

Identifiers Required

Snowflake (Bobsled docs)

Snowflake Secure Data Sharing (Snowflake docs)

Organization Name: A Snowflake object that links accounts owned by your business entity, automatically created with a system-generated name or assigned by Snowflake personnel (Snowflake docs)

Databricks (Bobsled docs)

Delta Sharing (Databricks docs)

Metastore ID: A UUID that uniquely identifies the top-level container for data in Unity Catalog, extracted from the sharing identifier format cloud:region:uuid (Databricks docs)

BigQuery (Bobsled docs)

BigQuery Analytics Hub (Google docs)

Google Principal: An IAM principal that can be a Google Account (email address), Service Account (for applications), or Google Group, used to grant access to BigQuery resources (Google docs)

Amazon Redshift (Bobsled docs)

Redshift Data Sharing (AWS docs)

AWS Account ID: A 12-digit number that uniquely identifies an AWS account and distinguishes resources in one account from another (AWS docs)

AWS S3 (Bobsled docs)

S3 API with Zero-copy architecture (AWS docs)

ARN (Amazon Resource Name): A unique identifier that specifies AWS resources unambiguously across all of AWS, required for IAM policies and API calls (AWS docs)

Azure Storage (Bobsled docs)

Azure Blob Storage API (Microsoft docs)

Storage Account Name: Unique name of the Azure storage account (Microsoft docs)

Google Cloud Storage (Bobsled docs)

GCS API (Google docs)

Google Principal: An IAM principal that can be a Google Account (email address), Service Account (for applications), or Google Group, used to grant access to Cloud Storage resources (Google docs)

Does Bobsled support ETL-based delivery?

Bobsled exclusively uses data sharing when delivering to any data warehouse, but supports ETL-based delivery to specific cloud storage destinations. With these "external bucket" options, users can directly write data to a consumer's file storage.

See more details in the specific pages:

  • External S3 bucket setup guide (Link)

  • External Azure Storage setup guide (Link)