"It is a bit arbitrary, but I think this is what they're tracking."

I don't know if they can get their numbers right this way, but this seems a way more useful metric, than theoretic capabilities.

ok, but arn't you just measuring efficiency and not the big I in AGI improvements.

	▲	Leynos 2 hours ago \| parent \| next [-]
		It also measures task coherence—ability to plan, form contingencies, recover from errors, mitigate accumulation of errors, and reconcile findings across a long context window.
	▲	jsnell 4 hours ago \| parent \| prev \| next [-]
		No? I think you're misunderstanding what is being measured. It is purely a test of capabilities (can it do a thing that takes a human $X hours), not efficiency (how fast will it do it).
	▲	lukan 5 hours ago \| parent \| prev [-]
		Yes, but this study was not about that and "just efficiency" is actually what most people are after. At least I want AI to solve my problems, not score high on a academic leaderboard.