Remix.run Logo
7e 7 hours ago

2 PB? They will not come close to training in on that amount. Maybe years from now.

sgt 7 hours ago | parent | next [-]

Think they will not train on the dull 2TB but use that as the data lake to start and then apply a more targeted approach.

winddude 6 hours ago | parent [-]

if you read the article 2pb is available as flash storage in the data pipeline, used to dedupe, clean, normalize, etc, for training from 60pb of raw data.

Den_VR 7 hours ago | parent | prev | next [-]

Could probably LoRA with that

huflungdung 7 hours ago | parent | prev [-]

[dead]