BigQuery Resources

This is a living document of BigQuery resources that I have found useful (with some comments about their usefulness).

“Under the hood” articles

Anatomy of a BigQuery Query: Why is BigQuery cool?

BigQuery under the hood: High-level descriptioon of how BigQuery is able to be so fast.

In memory query execution: If you like map reduce you should read this article. It talks about what is unique about BigQuery’s shuffle.

Dremel: The paper about Dremel (Google’s internal name for BigQuery). If you’re interested in data structures you should check out this paper and learn about repetition and definition levels.

How does Google compress the columns (it’s complicated and depends on what your data looks like)

Performance

IO best practices: Mainly, avoid SELECT * queries.

Communication performance: reduce datta before joins, use WITH clauses primarily for readability (they aren’t materialized)

Optimize query computation: Practical tips for how to make BigQuery queries faster.

SQL anti-patterns: A list of common SQL anti-patterns to avoid.

Style

dbt style guide: Some good general advice that is aplicable to BigQuery.

Gitlab style guide

My articles

BigQuery records

Null in BigQuery

Unnecessary BigQuery optimization

“Under the hood” articles#

Performance#

Style#

My articles#

“Under the hood” articles

Performance

Style

My articles