▲ | fooker 4 days ago | |||||||
No, if you try to train an LLM like you're suggesting: - you'll get something similar to gpt2. - To approach the scale of modern LLMs, you'll need about 10x more than all the GPUs in the world. It's a neat abstraction to consider these the same, but do you think Meta is paying 100M for writing a 15 line script? | ||||||||
▲ | kouteiheika 4 days ago | parent [-] | |||||||
I still don't understand what exactly you are disagreeing with. Meta is paying the big bucks because to train a big LLM in a reasonable time you need *scale*. But the process itself is the same as full fine-tuning, just scaled up across many GPUs. If I would be patient enough to wait a few years/decades for my single GPU to chug through 15 trillion tokens then I could too train a Llama from scratch (assuming I feed it the same training data). | ||||||||
|