Remix clone Hacker News

new | show | ask | jobs Github

	▲	Retr0id 3 hours ago
		> RLVR is weirder, and I suspect it's why we see "It's not X, it's Y" so often. This feels like an easy enough hypothesis to verify, for anyone in the business of training LLMs - does the not-X-but-Y rate increase after RLVR?
	▲	andy99 3 hours ago \| parent [-]
		It’s unlikely this is true. LLMs are way more mad-libs / templates than we like to admit, that’s (ironically) not a judgement about their capability, it’s primarily just an observation. But it’s also what plain old SFT, which I believe is the primary culprit, ends up imparting.