Remix clone Hacker News

new | show | ask | jobs Github

	▲	mike_hearn 3 hours ago
		Yes there's a whole ecosystem of companies that create and sell RL gyms to AI labs and of course they develop their own internally too. You don't hear much about this ecosystem because RL at scale is all private. Nearly no academic research on it. A lot of this is probably just throwing roughly equal amounts of compute at continuous RLVR training. I'm not convinced there's any big research breakthrough that separates GPT 5.4 from 5.2. The diff is probably more than just checkpoints but less than neural architecture changes and more towards the former than the latter. I think it's just easy to underestimate how much impact continuous training+scaling can have on the underlying capabilities.