▲ | Normal_gaussian 10 hours ago | |||||||
Big data's no true scotsman problem: > Despite what their name might suggest, so-called “large language models” (LLMs) are trained on relatively small datasets.1 2 3 For starters, all the aforementioned measurements are described in terms of terabytes (TBs), which is not typically a unit of measurement one uses when referring to “big data.” Big data is measured in petabytes (1,000 times larger than a terabyte), exabytes (1,000,000 times larger), and sometimes zettabytes (1,000,000,000 times larger). | ||||||||
▲ | williamtrask 10 hours ago | parent | next [-] | |||||||
(OP here) — with you on that analysis. This was in an effort to make the piece legible for a (primarily) non-technical, policy audience. Rigorous numbers are in other parts of the piece (and in the sources behind them). | ||||||||
▲ | eichin 10 hours ago | parent | prev [-] | |||||||
The joke (10 years ago) was that "big data" means "doesn't fit on my Mac". Kind of still works... | ||||||||
|