Google BigQuery

Prev Next

Google BigQuery (GBQ) is one of many delivery destinations that Bobsled supports. When delivering data from cloud object storage, Bobsled will turn the selected folders into tables in BigQuery. To facilitate data sharing, Bobsled leverages Google Analytics Hub data exchanges. Each Google principal granted access to the Bobsled share is authorized to access the shared datasets in the data exchange. Data consumers can access and subscribe to the data exchange's listing, enabling them to perform data queries within their projects.

KNOWN LIMITATIONS:
Bobsled data transfers to BigQuery are currently limited to 10,000 source files per table.


Bobsled-managed Google BigQuery

To learn how to configure a Google BigQuery destination in Bobsled, please visit Bobsled-managed GBQ setup guide.

Authorization

Bobsled requires a consumer’s Google principal(s) to grant access to the Analytics Hub data exchange. To learn more about the Google BigQuery sharing identifier used within Bobsled, please visit Account Access Identifiers in Google Cloud. For more, visit the advanced destination table settings section.


Bobsled supports various advanced settings to further control how tables are delivered in BigQuery.

Clustering

Bobsled supports the setting of clustering ↗ in BigQuery, resulting in optimized tables for expected query patterns. This can be set via the Bobsled Application or API ↗

Unsupported data types

Google BigQuery supports clustering on flat, high-cardinality columns only. Follow the Bobsled Data Type documentation to ascertain which Google BigQuery Data types aren’t supported:

  • ARRAY

  • FLOAT64

  • BYTES

  • JSON

  • STRUCT

  • TIME

Best practices:

  • Use flat, scalar columns.

  • Consider materializing or flattening nested fields before clustering.

NOTE:
Bobsled validates cluster key selections during setup and prevents use of unsupported types via UI or API.

TIP:
If you are interested in using clustering to deliver optimized tables to your consumers but need assistance with the setup, please reach out to your account team.


Datatype override

Bobsled allows you to override a column's data type in your source schema with a different data type in the destination table. This functionality is primarily used for certain geospatial data types, which are available in BigQuery but not specifiable in Parquet. If you wish to leverage data type overriding, please reach out to your account team.

TIP:
If you wish to leverage data type overriding, please reach out to your account team.


Schema migration support

When new columns are added to tables or files, Bobsled efficiently handles schema migrations by adding new columns to existing tables without disrupting deliveries.

  • When new columns are introduced, they're seamlessly integrated, and any missing data in these columns is defaulted to null values.

    • This approach ensures that data loading continues smoothly, even with schema changes, preventing load failures and maintaining data integrity.

  • Our schema migration strategy is designed for flexibility and reliability during data structure evolution.

  • When columns aren't present in new files, the value for missing columns is set to null.


Consuming a data transfer

Once you’ve configured your destination in a share, granted access to a consumer, and transferred data, learn how to consume a data transfer in Google BigQuery