François Chollet, creator of ARC-AGI, has consistently said that solving the benchmark does not mean we have AGI. It has always been meant as a stepping stone to encourage progress in the correct direction rather than as an indicator of reaching the destination. That's why he is working on ARC-AGI-3 (to be released in a few weeks) and ARC-AGI-4.

His definition of reaching AGI, as I understand it, is when it becomes impossible to construct the next version of ARC-AGI because we can no longer find tasks that are feasible for normal humans but unsolved by AI.

▲

beklein 3 hours ago | parent | next [-]

https://x.com/fchollet/status/2022036543582638517

	▲	joelthelion 2 hours ago \| parent [-]
		Do opus 4.6 or gemini deep think really use test time adaptation ? How does it work in practice?

▲

mapontosevenths 2 hours ago | parent | prev | next [-]

> His definition of reaching AGI, as I understand it, is when it becomes impossible to construct the next version of ARC-AGI because we can no longer find tasks that are feasible for normal humans but unsolved by AI.

That is the best definition I've yet to read. If something claims to be conscious and we can't prove it's not, we have no choice but to believe it.

Thats said, I'm reminded of the impossible voting tests they used to give black people to prevent them from voting. We dont ask nearly so much proof from a human, we take their word for it. On the few occasions we did ask for proof it inevitably led to horrific abuse.

Edit: The average human tested scores 60%. So the machines are already smarter on an individual basis than the average human.

▲

estearum 2 hours ago | parent | next [-]

> If something claims to be conscious and we can't prove it's not, we have no choice but to believe it.

This is not a good test.

A dog won't claim to be conscious but clearly is, despite you not being able to prove one way or the other.

GPT-3 will claim to be conscious and (probably) isn't, despite you not being able to prove one way or the other.

	▲	dullcrisp an hour ago \| parent [-]
		An LLM will claim whatever you tell it to claim. (In fact this Hacker News comment is also conscious.) A dog won’t even claim to be a good boy.

▲

WarmWash 2 hours ago | parent | prev | next [-]

>because we can no longer find tasks that are feasible for normal humans but unsolved by AI.

"Answer "I don't know" if you don't know an answer to one of the questions"

	▲	mrandish 6 minutes ago \| parent [-]
		I've been surprised how difficult it is for LLMs to simply answer "I don't know." It also seems oddly difficult for them to 'right-size' the length and depth of their answers based on prior context. I either have to give it a fixed length limit or put up with exhaustive answers.

▲

sva_ 2 hours ago | parent | prev | next [-]

> Edit: The average human tested scores 60%. So the machines are already smarter on an individual basis than the average human.

I think being better at this particular benchmark does not imply they're 'smarter'.

▲

criddell an hour ago | parent | prev | next [-]

> The average human tested scores 60%. So the machines are already smarter on an individual basis than the average human.

Maybe it's testing the wrong things then. Even those of use who are merely average can do lots of things that machines don't seem to be very good at.

I think ability to learn should be a core part of any AGI. Take a toddler who has never seen anybody doing laundry before and you can teach them in a few minutes how to fold a t-shirt. Where are the dumb machines that can be taught?

▲

woah 2 hours ago | parent | prev [-]

> If something claims to be conscious and we can't prove it's not, we have no choice but to believe it.

Can you "prove" that GPT2 isn't concious?

	▲	mapontosevenths an hour ago \| parent [-]
		If we equate self awareness with consciousness then yes. Several papers have now shown that SOTA models have self awareness of at least a limited sort. [0][1] As far as I'm aware no one has ever proven that for GPT 2, but the methodology for testing it is available if you're interested. [0]https://arxiv.org/pdf/2501.11120 [1]https://transformer-circuits.pub/2025/introspection/index.ht...

▲

hmmmmmmmmmmmmmm 3 hours ago | parent | prev [-]

I don't think the creator believes ARC3 can't be solved but rather that it can't be solved "efficiently" and >$13 per task for ARC2 is certainly not efficient.

But at this rate, the people who talk about the goal posts shifting even once we achieve AGI may end up correct, though I don't think this benchmark is particularly great either.