| ▲ | 55555 4 hours ago | ||||||||||||||||
What are companies needing all of these hard drives for? I understand their need for memory, and boot. But storing text training data and text conversations isn't that space intensive. There's a few companies doing video models, so I can see how that takes a tremendous amount of space. Is it just that? | |||||||||||||||||
| ▲ | Ekaros 4 hours ago | parent | next [-] | ||||||||||||||||
Hearing about their scrapping practises it might be that they are storing same data over and over and over again. And then yes, audio and video is likely something they are planning for or already gathering. And if they produce lot of video, they might keep copies around. | |||||||||||||||||
| ▲ | red75prime 4 hours ago | parent | prev | next [-] | ||||||||||||||||
All the latest general purpose models are multimodal (except DeepSeek I think). Transfer learning allows to improve results even after they exhausted all the text in the internet. | |||||||||||||||||
| ▲ | pixelesque 4 hours ago | parent | prev | next [-] | ||||||||||||||||
Storing training data: for example, Anthropic bought millions of second hand books and scanned them: https://www.washingtonpost.com/technology/2026/01/27/anthrop... | |||||||||||||||||
| |||||||||||||||||
| ▲ | numpad0 3 hours ago | parent | prev | next [-] | ||||||||||||||||
I think the somewhat hallucinatory canned response is that they distribute data across drives for a massive throughput. Though idk if that even technically makes sense... | |||||||||||||||||
| ▲ | jmclnx 4 hours ago | parent | prev [-] | ||||||||||||||||
I am surprised by that too. I thought everyone moved to SDDs or NVMe ? I was toying with getting a 2T HDD for a BSD system I have, I guess not now :) | |||||||||||||||||
| |||||||||||||||||