Remix.run Logo
alex43578 5 hours ago

That’s like saying you could train a state of the art model by hand, and it’ll only cost you a lot of man-hours.

Realistically, to train a frontier model you’d need quite a lot of compute. GPT4, which is old news, was supposedly trained on 25,000 A100s.

There’s just no reasonable way of catching modern hardware with old hardware+time/electricity.

fc417fc802 4 hours ago | parent [-]

Training methods and architectures keep getting more efficient by leaps and bounds and scaling up was well into the realm of diminishing returns last I checked. The necessity of exceeding 100B seems questionable. Just because you can get some benefits by piling ever more data on doesn't necessarily mean you have to.

Also keep in mind we aren't talking about a small company wanting to do competitive R&D on a frontier model. We're talking about a world superpower that operates nuclear reactors and built something the size of the three gorges dam deciding that a thing is strategically necessary. If they were willing to spend the money I am absolutely certain that they could pull it off.