| ▲ | jszymborski 3 days ago | |
It's also already used for language modelling: MLM is masked language modelling, another phrase for training models on the cloze task. It's the most common way to train encoder-only models. CLM (causal language modelling) is the other common task where you autoregressively predict the next token given the previous ones. It's the most common way to train decoder-only models. | ||