Data filtering. Dataset curation. Curriculum learning. All already in use.

It's not sexy, it's not a breakthrough, but it does help.

> All already in use.

At the big labs that makes sense. Bit more puzzled by why it isn’t used in the toy projects. Certainly more complexity but seems like it would make a big difference

▲

famouswaffles 2 days ago | parent | prev [-]

Curriculum learning is not really a thing for these large SOTA LLM training runs (specifically pre-training). We know it would help, but ordering trillions of tokens of data in this way would be a herculean task.

	▲	ACCount37 2 days ago \| parent [-]
		I've heard things about pre-training optimization. "Soft start" and such. So I struggle to believe that curriculum learning is not a thing on any frontier runs. Sure, it's a lot of data to sift through, and the time and cost to do so can be substantial. But if you are already planning on funneling all of that through a 1T LLM? You might as well pass the fragments through a small classifier before you do that.