Remix.run Logo
mNovak 3 hours ago

I'm excited for the big jump in ARC-AGI scores from recent models, but no one should think for a second this is some leap in "general intelligence".

I joke to myself that the G in ARC-AGI is "graphical". I think what's held back models on ARC-AGI is their terrible spatial reasoning, and I'm guessing that's what the recent models have cracked.

Looking forward to ARC-AGI 3, which focuses on trial and error and exploring a set of constraints via games.

3 minutes ago | parent | next [-]
[deleted]
causal 3 hours ago | parent | prev | next [-]

Agreed. I love the elegance of ARC, but it always felt like a gotcha to give spatial reasoning challenges to token generators- and the fact that the token generators are somehow beating it anyway really says something.

throw310822 3 hours ago | parent | prev | next [-]

The average ARC AGI 2 score for a single human is around 60%.

"100% of tasks have been solved by at least 2 humans (many by more) in under 2 attempts. The average test-taker score was 60%."

https://arcprize.org/arc-agi/2/

modeless 2 hours ago | parent | next [-]

Worth keeping in mind that in this case the test takers were random members of the general public. The score of e.g. people with bachelor's degrees in science and engineering would be significantly higher.

throw310822 2 hours ago | parent [-]

Random members of the public = average human beings. I thought those were already classified as General Intelligences.

imiric an hour ago | parent | prev [-]

What is the point of comparing performance of these tools to humans? Machines have been able to accomplish specific tasks better than humans since the industrial revolution. Yet we don't ascribe intelligence to a calculator.

None of these benchmarks prove these tools are intelligent, let alone generally intelligent. The hubris and grift are exhausting.

throw310822 an hour ago | parent | next [-]

> Machines have been able to accomplish specific tasks...

Indeed, and the specific task machines are accomplishing now is intelligence. Not yet "better than human" (and certainly not better than every human) but getting closer.

imiric 26 minutes ago | parent [-]

> Indeed, and the specific task machines are accomplishing now is intelligence.

How so? This sentence, like most of this field, is making baseless claims that are more aspirational than true.

Maybe it would help if we could first agree on a definition of "intelligence", yet we don't have a reliable way of measuring that in living beings either.

If the people building and hyping this technology had any sense of modesty, they would present it as what it actually is: a large pattern matching and generation machine. This doesn't mean that this can't be very useful, perhaps generally so, but it's a huge stretch and an insult to living beings to call this intelligence.

But there's a great deal of money to be made on this idea we've been chasing for decades now, so here we are.

warkdarrior 10 minutes ago | parent [-]

> Maybe it would help if we could first agree on a definition of "intelligence", yet we don't have a reliable way of measuring that in living beings either.

How about this specific definition of intelligence?

   Solve any task provided as text or images.
AGI would be to achieve that faster than an average human.
guelo an hour ago | parent | prev [-]

What's the point of denying or downplaying that we are seeing amazing and accelerating advancements in areas that many of us thought were impossible?

colordrops 3 hours ago | parent | prev [-]

Wouldn't you deal with spatial reasoning by giving it access to a tool that structures the space in a way it can understand or just is a sub-model that can do spatial reasoning? These "general" models would serve as the frontal cortex while other models do specialized work. What is missing?

amelius 3 minutes ago | parent | next [-]

They should train more on sports commentary, perhaps that could give spatial reasoning a boost.

causal 3 hours ago | parent | prev [-]

That's a bit like saying just give blind people cameras so they can see.