▲ | mdp2021 4 days ago | |
Sure but, you wrote: > If anything, this shows that some LLMs might win against humans because they can spend more time thinking per wall clock time interval thanks to the underlying hardware. Not because they are fundamentally smarter. You interpreted "smarter" the IQ way: results constrained time. But we actually get an indicator about the ability of the LLM to be able to reach, given time, the result or not - that is the interpretation of "smarter" that many of us need. (Of course, it remains to be seen whether the ability to achieve those contextual results exports as an ability relevant to the solutions we actually need.) | ||
▲ | sigmoid10 4 days ago | parent [-] | |
No, you misunderstood. I'm saying that for reasoning models, there is a lot of untapped capability in this test. I wouldn't be sure that there are hard limits in the sense that I think given enough compute, you'll probably find that a modern high end model will reach 100%. But you probably don't want to spend thousands (or perhaps tens of thousands) of dollars on that. There are much better tests out there if you have money to burn and want to find true hard limits compared to humans. |