| ▲ | spwa4 5 hours ago | |||||||
> I don't disagree, but how much of this ends up being distillation? A lot, so you can bet tens of millions are flowing to congress to have distillation declared illegal before this happens. And then it'll happen anyway. | ||||||||
| ▲ | lambda 5 hours ago | parent [-] | |||||||
Distillation isn't only between different labs. A lab can train a large model, and then distill a smaller model from it that retains the majority of the useful capbility. I don't know well enough if there's any benefit of that over just training the smaller model directly, but I'll bet there are some times where that is useful. I could easily see it being easier to do the initial pre-training on a larger model but be able to distill everything useful down into a smaller model, essentially filtering out a lot of noise in the process. | ||||||||
| ||||||||