|
| ▲ | jawns 7 hours ago | parent | next [-] |
| I think it's partly tongue in cheek, because when "big data" was over hyped, everyone claimed they were working with big data, or tried to sell expensive solutions for working with big data, and some reasonable minds spoke up and pointed out that a standard laptop could process more "big data" than people thought. |
| |
|
| ▲ | rattray 7 hours ago | parent | prev | next [-] |
| > For our first experiment, we used ClickBench, an analytical database benchmark. ClickBench has 43 queries that focus on aggregation and filtering operations. The operations run on a single wide table with 100M rows, which uses about 14 GB when serialized to Parquet and 75 GB when stored in CSV format. very much so… |
|
| ▲ | rrr_oh_man 6 hours ago | parent | prev | next [-] |
| In my former life as a soulless consultant mid-level IT managers really liked to hear the 3 "V"s mentioned: Velocity, Volume, Variety |
| |
|
| ▲ | speedgoose 6 hours ago | parent | prev | next [-] |
| Computers got bigger and software got smarter. You have phones that are faster than cloud VMs of the past. You can use bare metal servers with up to 344 cores and 16TB of ram. I used to share your definition too, but I now say that if it doesn’t open in Microsoft Excel, it’s big data. |
| |
| ▲ | Zambyte 6 hours ago | parent [-] | | Processing data that cannot be processed on a single machine is fundamentally a different problem than processing data that can be processed on a single machine. It's useful to have a term for that. As you say, single machines can scale up incredibly far. That just means 16 TB datasets no longer demand big data solutions. | | |
| ▲ | speedgoose 6 hours ago | parent [-] | | I get your point, but I don’t know if big data is the right term anymore. Many people like to think they have big data, and you kinda have to agree with them if you want their money. At least in consulting. Also you could go well beyond a 16TB dataset on a single machine. You assume that the whole uncompressed dataset has to fit in memory, but many workloads don’t need that. How many people in the world have such big datasets to analyse within reasonable time? Some people say extreme data. |
|
|
|
| ▲ | bcye 7 hours ago | parent | prev | next [-] |
| I think they are simply referring to analytical workloads. |
|
| ▲ | brudgers 6 hours ago | parent | prev | next [-] |
| “Your data isn’t big” is a good working definition of big data. Google has big data. You are not google. |
| |
| ▲ | antonyh 4 hours ago | parent [-] | | I think the definition of big is smaller than that. Mine was "too big to fit on a maxed-out laptop", effectively >8TB. Our photo collection is bigger than that, it's not 'big data'. Or one could define it as too big to fit on a single SSD/HDD, maybe >30TB. Still within the reach of a hobbyist, but too large to process in memory and needs special tools to work with. It doesn't have to be petabyte scale to need 'big data' tooling. |
|
|
| ▲ | 7 hours ago | parent | prev [-] |
| [deleted] |