Remix clone Hacker News

new | show | ask | jobs Github

	▲	jszymborski 3 days ago
		It's also already used for language modelling: MLM is masked language modelling, another phrase for training models on the cloze task. It's the most common way to train encoder-only models. CLM (causal language modelling) is the other common task where you autoregressively predict the next token given the previous ones. It's the most common way to train decoder-only models.