| ▲ | LarsDu88 5 hours ago | |||||||
This was exactly what I was thinking of. RLVR is the secret sauce behind o3 and its many successors. Its the secret sauce behind why the current models are so great at coding and soon to be unbeatable at math. LLMs can pose many questions and if they are easily verifiable, fine tune very heavily. A lot of the world models discussion will inevitable lean into simulations as verification. | ||||||||
| ▲ | code_biologist 27 minutes ago | parent [-] | |||||||
I'll admit that I miss having access to the ChatGPT 4.5 "absolutely gigantic model" with enough tuning to make it sane and useful. The RLVR models are superb for actual tasks in those RLVR domains, but that fine tuned view of the world as a verifiable problem to solve makes them feel worse for touchy feely stuff. Even for medical consultation and diagnosis, RLVR model's urge to reach a conclusion often is a liability. | ||||||||
| ||||||||