Remix.run Logo
Zambyte 6 hours ago

Processing data that cannot be processed on a single machine is fundamentally a different problem than processing data that can be processed on a single machine. It's useful to have a term for that.

As you say, single machines can scale up incredibly far. That just means 16 TB datasets no longer demand big data solutions.

speedgoose 6 hours ago | parent [-]

I get your point, but I don’t know if big data is the right term anymore.

Many people like to think they have big data, and you kinda have to agree with them if you want their money. At least in consulting.

Also you could go well beyond a 16TB dataset on a single machine. You assume that the whole uncompressed dataset has to fit in memory, but many workloads don’t need that.

How many people in the world have such big datasets to analyse within reasonable time?

Some people say extreme data.