Remix.run Logo
libraryofbabel a day ago

Agree, the RLVR tasks are probably long series of tool calls at this point doing complex tasks in some simulated dev environment.

That said, I think it's hard to say how much of a difference it really makes in terms of making Claude Code specifically better than other coding agents using the same LLM (versus just making the LLM better for all coding agents using roughly similar tools). There is probably some difference, but you'd need to run a lot of benchmarks to find out.

aszen a day ago | parent [-]

Agreed it probably contributes to the model improving for all agents but crucially it is verifiably better against their own agent. So they get a good feedback loop to improve both