| ▲ | jeremyjh 5 hours ago | |||||||
I think that’s mostly because they get so much more of that reinforcement learning - since it is so economical. I dont know if there is any evidence of a fundamental reason they can’t be just as good at other tasks, but it might be economically infeasible for awhile yet. | ||||||||
| ▲ | mjburgess 4 hours ago | parent | next [-] | |||||||
No one is curating vast amounts of data for them in other domains. Programmers send programs with fixes | ||||||||
| ||||||||
| ▲ | emp17344 3 hours ago | parent | prev [-] | |||||||
RLVR doesn’t work for unverifiable tasks, so they won’t be able to effectively use tools to boost reliability for those tasks. | ||||||||