| ▲ | u1hcw9nx 2 hours ago | |||||||
>The results of this paper should not be interpreted as suggesting that AI can consistently solve research-level mathematics questions. In fact, our anecdotal experience is the opposite: success cases are rare, and an apt intuition for autonomous capabilities (and limitations) may currently be important for finding such cases. The papers (ACGKMP26; Feng26; LeeSeo26) grew out of spontaneous positive outcomes in a wider benchmarking effort on research-level problems; for most of these problems, no autonomous progress was made. | ||||||||
| ▲ | noosphr 15 minutes ago | parent | next [-] | |||||||
I've been at this longer than just about anyone. After three major generations of models the "intuition" I've build isn't about what AI can do, but about what a specific model family can do. No one cares what the gotchas in gpt3 are because it's a stupid model. In two years no one will care what they were for gpt5 or Claude 4 for the same reason. We currently have the option of wasting months of our lives to get good at a specific model, or burn millions to try and get those models to do things by themselves. Neither option is viable long term. | ||||||||
| ▲ | getnormality an hour ago | parent | prev | next [-] | |||||||
The ridiculous resources being thrown at this, and the ability through RLVR to throw gigatons of spaghetti at the wall to see what sticks, should make it very clear just how incredibly inefficient frontier AI reasoning is - however spectacular it may be that it can reason at this level at all. | ||||||||
| ||||||||
| ▲ | thereitgoes456 34 minutes ago | parent | prev [-] | |||||||
I credit them for acknowledging their limitations and not actively trying to be misleading. Unlike a certain other company in the space. | ||||||||