Remix.run Logo
saberience 4 hours ago

Arc-AGI (and Arc-AGI-2) is the most overhyped benchmark around though.

It's completely misnamed. It should be called useless visual puzzle benchmark 2.

It's a visual puzzle, making it way easier for humans than for models trained on text firstly. Secondly, it's not really that obvious or easy for humans to solve themselves!

So the idea that if an AI can solve "Arc-AGI" or "Arc-AGI-2" it's super smart or even "AGI" is frankly ridiculous. It's a puzzle that means nothing basically, other than the models can now solve "Arc-AGI"

CuriouslyC 4 hours ago | parent [-]

The puzzles are calibrated for human solve rates, but otherwise I agree.

saberience 4 hours ago | parent [-]

My two elderly parents cannot solve Arc-AGI puzzles, but can manage to navigate the physical world, their house, garden, make meals, clean the house, use the TV, etc.

I would say they do have "general intelligence", so whatever Arc-AGI is "solving" it's definitely not "AGI"

hmmmmmmmmmmmmmm 3 hours ago | parent [-]

You are confusing fluid intelligence with crystallised intelligence.

casey2 3 hours ago | parent [-]

I think you are making that confusion. Any robotic system in the place of his parents would fail with a few hours.

There are more novel tasks in a day than ARC provides.

hmmmmmmmmmmmmmm 3 hours ago | parent [-]

Children have great levels of fluid intelligence, that's how they are able to learn to quickly navigate in a world that they are still very new to. Seniors with decreasing capacity increasingly rely on crystallised intelligence, that's why they can still perform tasks like driving a car but can fail at completely novel tasks, sometimes even using a smartphone if they have not used one before.

zeroonetwothree 2 hours ago | parent [-]

It really depends on motivation. My 90 year old grandmother can use a smartphone just fine since she needs it to see pictures of her (great) grandkids.