Remix.run Logo
willvarfar 10 hours ago

Seriously, this is not what big data does today. Distributed query engines don't have the primitives to zip through two tables and treat them as column groups of the same wider logical table. There's a new kid on the block called LanceDB that has some of the same features but is aiming for different use-cases. My trick retrofits vertical partitioning into mainstream data lake stuff. It's generic and works on the tech stack my company uses but would also work on all the mainstream alternative stacks. Slightly slower on AWS. But anyway. I guess HN just wants to see an industrial track paper.