Sequencing my genome

We’ve come a long way since the Human Genome Project. This effort to sequence the human genome for the first time was a tremendous achievement that lasted 13 (ish) years and cost $3 billion dollars. Thanks to this pioneering work and major advancements over the years it’s now possible to get your own genome sequenced for $300 in a few months. That’s a 10,000x decrease in cost! If the price of homes decreased at a similar rate since 1990, the median home would cost $7....

October 8, 2021 · 4 min · Peter Boone  · Source

NULL in BigQuery

When in doubt, NULL means “we don’t know the state of this thing so we can’t perform an operation on it.” Keeping track of how NULL values are handled in different SQL dialects can be tricky. This post will serve as a living document where I keep track of how BigQuery does this. NULL and STRING comparison Comparison always returns NULL. SELECT CAST(NULL AS STRING) = '', -- null CAST(NULL AS STRING) = 'value', -- null CAST(NULL AS STRING) = CAST(NULL AS STRING) -- null NULL and BOOL comparison NULL is not TRUE or FALSE....

September 14, 2021 · 7 min · Peter Boone  · Source

The practical use of repetition and definition levels in BigQuery

Google’s Dremel paper is an interesting read that explains some of the concepts that underlie BigQuery. I am still processing the paper and have noticed a few things about repetition and definition levels that are relevant to the every day use of BigQuery. Columnar Data and Records The underlying storage format for BigQuery is columnar. One of the first pieces of advice given to people using BigQuery is to only select the rows that you need....

September 12, 2021 · 2 min · Peter Boone  · Source

Unnecessary BigQuery Optimization

I was reading a post about BigQuery cost optimization which stated this: Also remember you are charged for bytes processed in the first stage of query execution. Avoid creating a complex multistage query just to optimize for bytes processed in the intermediate stages, since there are no cost implications anyway (though you may achieve performance gains). I was curious about this, because I have written some queries to minimize reading data multiple times....

September 12, 2021 · 4 min · Peter Boone  · Source