| ▲ | dagss 2 hours ago | |
Not an AI researcher and I don't really know, but intuitively it makes a lot of sense to me. To do well as an LLM you want to end up with the weights that gets furthest in the direction of "reasoning". So assume that with just one language there's a possibility to get stuck in local optima of weights that do well on the English test set but which doesn't reason well. If you then take the same model size but it has to manage to learn several languages, with the same number of weights, this would eliminate a lot of those local optima because if you don't manage to get the weights into a regime where real reasoning/deeper concepts is "understood" then it's not possible to do well with several languages with the same number of weights. And if you speak several languages that would naturally bring in more abstraction, that the concept of "cat" is different from the word "cat" in a given language, and so on. | ||