That’s misunderstanding why these models are behind. A large part of why they’re behind is they aren’t able to do the reinforcement learning post-training steps that takes a pre-trained model and turns it into a frontier model like GPT 5 or Opus. Instead they do their best to recreate these models using distillation.

Fundamentally, you can never distill your way to being the teacher, so these approaches will not advance the frontier.

[edit, after thinking about it I think my phrasing is unfair. It's not necessarily that aren't able to do it, but they haven't yet shown that they are willing to do it.]

▲

computerex 2 hours ago | parent | next [-]

That’s not remotely true. They did distillation as a cheap solution to the cold start problem. You need data/trajectories to hill climb to higher capabilities. All large Chinese labs do RLAIF.

▲

sulam an hour ago | parent [-]

Oh yes, not remotely true. Which is why the frontier labs all have invested heavily in trying to identify and thwart distillers, using known company names / domains to drive their exclusion lists.

	▲	logicchains an hour ago \| parent [-]
		It's cheaper to distill than to do reinforcement learning, so of course they prefer that, but if it wasn't an option they could just pay up and spend more GPU time on RL.

▲

FpUser 2 hours ago | parent | prev [-]

>"they aren’t able to do the reinforcement learning post-training steps"

Not yet.

If there is a need someone will come and fulfill. Personally for me now I do not even want to use top models. Professionally I use AI to help with the coding using Junie agent that comes with IDEs from JetBrains. Junie is told to use Gemini Flash and works fine for what I ("I" being an emphasis here) ask it to do. I tried more advanced models and different vendors only to discover credits going down the toilet without any extra benefit.

	▲	sulam an hour ago \| parent [-]
		I'll agree I guess and clarify that the better phrasing is probably something like "haven't yet shown the capability to."