Remix.run Logo
futureshock 3 hours ago

Well yes, that is exactly the point! The very purpose of the ARC AGI benchmarks is to find a pure reasoning task that humans are very good at and AI is very bad at. Companies then race each other to get a high score on that benchmark. Sure there’s going to be a lot of “studying for the test” and benchmaxing, but once a benchmark gets close to being saturated, ARC releases a new benchmark with a new task the AI is terrible at. This will rinse and repeat till ARC can find no reasoning task that AI cannot do that a human could. At that point we will effectively have AGI.

I believe the CEO of ARC has said they expect us to get to ARC-AGI-7 before declaring AGI.