▲ | dinobones 3 days ago | |
I always see these fancy DB engines and data lake blog posts and I am curious… why? At every place I’ve worked at this is a solved problem: Hive+Spark, just keep everything sharded across a ton of machines. It’s cheaper to pay for a Hive cluster that does dumb queries than paying these expensive DB licenses, data engineers building arbitrary indices, etc… just throw compute at the problem, who cares. 1TB of RAM /flash is so cheap these days. Even working on the worlds “biggest platforms” a daily partition of user data is like 2TB. You’re telling me a F500 can’t buy a 5 machine/40TB cluster for like $40k and basically be set? | ||
▲ | pragmatic 2 days ago | parent [-] | |
A fellow data swamp enjoyer. Just dump it in Hadoop became an anti-pattern and everyone yearned for databases and clean data and not dealing with internal IT and the cluster “admins”. |