Remix.run Logo
Normal_gaussian 10 hours ago

Big data's no true scotsman problem:

> Despite what their name might suggest, so-called “large language models” (LLMs) are trained on relatively small datasets.1 2 3 For starters, all the aforementioned measurements are described in terms of terabytes (TBs), which is not typically a unit of measurement one uses when referring to “big data.” Big data is measured in petabytes (1,000 times larger than a terabyte), exabytes (1,000,000 times larger), and sometimes zettabytes (1,000,000,000 times larger).

williamtrask 10 hours ago | parent | next [-]

(OP here) — with you on that analysis. This was in an effort to make the piece legible for a (primarily) non-technical, policy audience. Rigorous numbers are in other parts of the piece (and in the sources behind them).

eichin 10 hours ago | parent | prev [-]

The joke (10 years ago) was that "big data" means "doesn't fit on my Mac". Kind of still works...

pimlottc 10 hours ago | parent [-]

Isn’t the basically true? The crux of big data is that requires different techniques since you can’t just processing it on one device