▲ | kouteiheika 4 days ago | ||||||||||||||||
No they're not? The process is essentially exactly the same, just with a much lower total FLOPs budget, since if you're not training from scratch then you don't need to train for as long. I can use *exactly* the same code that I used to fine-tune a model to train a new model from scratch; literally the only difference is whether I initialize the initial weights randomly or with an existing model, a couple of hyperparameters (e.g. for training from scratch you want to start at a higher LR), and training for longer. | |||||||||||||||||
▲ | fooker 4 days ago | parent [-] | ||||||||||||||||
No, if you try to train an LLM like you're suggesting: - you'll get something similar to gpt2. - To approach the scale of modern LLMs, you'll need about 10x more than all the GPUs in the world. It's a neat abstraction to consider these the same, but do you think Meta is paying 100M for writing a 15 line script? | |||||||||||||||||
|