This would likely only get used for small finetuning jobs. It’s too slow for the scale of pretraining.

It’s too slow for the scale of pretraining.

There isn't really such a thing as 'too slow' as an objective fact though. It depends on how much patience and money for electricity you have. In AI image gen circles I see people complaining if a model takes more than 5s to generate an image, and other people on very limited hardware who happily wait half an hour per image. It's hard to make a judgement call about what 'too slow' means. It's quite subjective.

▲

jandrese 7 hours ago | parent | next [-]

If it would take so long to train that the model will be obsolete before the training is finished that might be considered too long. With ML you can definitely hit a point where it is too slow for any practical purpose.

▲

ismailmaj 6 hours ago | parent [-]

Obsolete because of what? Because with limited hardware you’re never aiming for state of the art, and for fine-tuning, you don’t steer for too long anyway.

	▲	jandrese 6 hours ago \| parent [-]
		Because there is a new model that is better, faster, more refined, etc... If your training time is measured in years or decades it probably won't be practical.

▲

jwilber 6 hours ago | parent | prev [-]

That’s just playing semantics. Nobody is talking about, “objective facts” or need define them here. If the step time is measured in days, and your model takes years to train, then it will never get trained to completion on consumer hardware (the entire point).

▲

greenavocado 7 hours ago | parent | prev | next [-]

So distribute copies of the model in RAM to multiple machines, have each machine update different parts of the model weights, and sync updates over the network

	▲	olliepro 4 hours ago \| parent [-]
		decentralized training makes a lot more sense when the required hardware isn't a $40K GPU...

▲

7 hours ago | parent | prev [-]

[deleted]