| ▲ | cyanydeez 5 hours ago | |
ok, but arn't you just measuring efficiency and not the big I in AGI improvements. | ||
| ▲ | Leynos 2 hours ago | parent | next [-] | |
It also measures task coherence—ability to plan, form contingencies, recover from errors, mitigate accumulation of errors, and reconcile findings across a long context window. | ||
| ▲ | jsnell 4 hours ago | parent | prev | next [-] | |
No? I think you're misunderstanding what is being measured. It is purely a test of capabilities (can it do a thing that takes a human $X hours), not efficiency (how fast will it do it). | ||
| ▲ | lukan 5 hours ago | parent | prev [-] | |
Yes, but this study was not about that and "just efficiency" is actually what most people are after. At least I want AI to solve my problems, not score high on a academic leaderboard. | ||