▲ | coalteddy 4 days ago | |
Very cool. Love this. Was the training more heavily weighted towards swiss languages and how does the model perform on swiss languages compared to others? Are there any plans for further models after this one? | ||
▲ | lllllm 3 days ago | parent [-] | |
The pretraining (so 99% of training) is fully global, in over 1000 languages without special weighting. The posttraining (See section 4 of the paper) had also as many languages as we could get, and did upweight some languages. The posttraining can easily be customized to any other target languages |