Remix clone Hacker News

new | show | ask | jobs Github

	▲	famouswaffles 2 days ago
		Curriculum learning is not really a thing for these large SOTA LLM training runs (specifically pre-training). We know it would help, but ordering trillions of tokens of data in this way would be a herculean task.
	▲	ACCount37 2 days ago \| parent [-]
		I've heard things about pre-training optimization. "Soft start" and such. So I struggle to believe that curriculum learning is not a thing on any frontier runs. Sure, it's a lot of data to sift through, and the time and cost to do so can be substantial. But if you are already planning on funneling all of that through a 1T LLM? You might as well pass the fragments through a small classifier before you do that.