Predicting the Order of Upcoming Tokens Improves Language Modeling

	▲	Predicting the Order of Upcoming Tokens Improves Language Modeling(arxiv.org)
		6 points by wavelander a day ago \| 1 comments

	▲	NitpickLawyer a day ago \| parent [-]
		Are any of these methods doable on pre-trained models? Like freeze the model and only train these add-ons? Having to redo the training runs with these optimisations doesn't sound too practical, in the great scheme of things.