Remix.run Logo
d_burfoot 5 days ago

To me the reason ARC-AGI puzzles are difficult for LLMs and possible for humans is that they are expressed in a format for which humans have powerful preprocessing capabilities.

Imagine the puzzle layouts were expressed in JSON instead of as a pattern of visual blocks. How many humans could solve them in that case?

jononor 5 days ago | parent | next [-]

We have powerful preprocessing blocks for images: Strong computer vision capabilities predates LLMs by several years. Image classification, segmentation, object detection, etc. All differential and trainable in same way as LLMs, including jointly. To the best of my knowledge, no team has shown really high scores by adding in a image preprocessing block?

pessimizer 5 days ago | parent | prev | next [-]

Every one who had access to a computer that could convert json into something more readable for humans, and would know that was the first thing they needed to do?

You might as well have asked how many English speakers could solve the questions if they were in Chinese. All of them. They would call up someone who spoke Chinese, pay them to translate the questions, then solve them. Or failing that, they would go to the bookstore, buy books on learning Chinese, and solve them three years from now.

kenjackson 5 days ago | parent | prev | next [-]

Bingo. We simply made a test for which we are well trained. We are constantly making real time decisions with our eyes. Interestingly certain monkeys are much better at certain visual pattern recognition than we are. They might laugh and think humans haven’t reached AGI yet.

timonofathens 4 days ago | parent | prev [-]

[dead]