That’s not remotely true. They did distillation as a cheap solution to the cold start problem. You need data/trajectories to hill climb to higher capabilities. All large Chinese labs do RLAIF.

▲

sulam an hour ago | parent [-]

Oh yes, not remotely true. Which is why the frontier labs all have invested heavily in trying to identify and thwart distillers, using known company names / domains to drive their exclusion lists.

	▲	logicchains an hour ago \| parent [-]
		It's cheaper to distill than to do reinforcement learning, so of course they prefer that, but if it wasn't an option they could just pay up and spend more GPU time on RL.