Remix.run Logo
willvarfar 3 days ago

Interesting reading!

Does this explain a big inefficiency in BigQuery where it only ever does hash-joins? Is it because it is push so it never does merge-joins even when the inputs are all sorted or clustered etc?

Although tbh can't see why a system can't combine them both; some of the edges being push+buffer and some being buffer+pull.

briankelly 2 days ago | parent [-]

If you read Extensible Query Optimizers in Practice, they explain why [0] at a high level and link to further reading.

> Although both Snowflake and BigQuery do not consider as many alternatives as Fabric DW, the dynamic schemes of the above systems avoids the potential bad plans chosen by the optimizer due to errors in cost and cardinality estimations.

I understand it's mostly due to the difficulty of getting the cost estimation reliably correct and so it defaults to something predictable to simplify the plan search as well.

[0] https://www.microsoft.com/en-us/research/wp-content/uploads/...