Remix.run Logo
fritzo 2 days ago

Have worst case optimal join algorithms made a practical impact, since the linked article's first publication in 2015? I've seen them in the context of egglog, but are they used in real world database management systems?

https://en.wikipedia.org/wiki/Worst-case_optimal_join_algori...

tylerhou 2 days ago | parent | next [-]

WCOJs guarantee an (asymptotic) upper bound on join complexity, but often with the right join plan and appropriate cardinality estimations, you can do much better than WCOJ.

The runtime of WCOJs algorithms are even more dependent on good cardinality estimation. For instance, in VAAT, the main difficulty to find an appropriate variable ordering, which relies on knowledge about cardinalities conditioned on particular variables having particular values. If you have the wrong ordering, you still achieve worst case optimal, but you could have done far better in some cases with other algorithms (e.g. Yannakakis algorithm for acyclic queries). And as far as I know, many DBMSes do not keep track of this type of conditional cardinality, so it is unlikely that existing WCOJ will be faster in practice.

The new hotness is "instance optimal" joins...

namibj 2 days ago | parent | prev | next [-]

Materialize.com runs on those techniques; not sure how far this has gotten into the query planner yet, though. It's been a while since I looked at those details for it.

I'm pretty sure some datalog (adjacent?) but otherwise quite proprietary solution (that might be datadog (or is similar enough that I'd have to go search old notes/chats to determine the details)) uses the technologies and was mentioned as affiliation for at least one author of at least one paper in that space; as of about early 2024.

Edit: likely was https://en.wikipedia.org/wiki/LogicBlox

rendaw 2 days ago | parent | prev [-]

I'm a layman, but I was really hoping to see some actual benchmark results in the paper.