Remix.run Logo
physicsguy 7 hours ago

You can get 32TiB of RAM instances on AWS these days

mr_toad 3 hours ago | parent | next [-]

Which is a lot for a single user, but when you have a dozens or hundreds of analysts who all want to run their own jobs on your hundred terabyte data warehouse then even the largest single machine wont cut it.

RobinL 7 hours ago | parent | prev | next [-]

Exactly - these huge machines are surely eating a lot into the need for distributed systems like Spark. So much less of a headache to run as well

jeffbee 6 hours ago | parent | prev [-]

That sounds damned near useless for typical data analysis purposes and I would very much prefer a distributed system to a system that would take an hour to fill main memory over its tiny network port. Also, those cost $400/hr and are specifically designed for businesses where they have backed themselves into a corner of needing to run a huge SAP HANA instance. I doubt they would even sell you one before you prove you have an SAP license.

For a tiny fraction of the cost you can get numerous nodes with 600gbps ethernet ports that can fill their memory in seconds.