| ▲ | futureshock 3 hours ago | |
Well yes, that is exactly the point! The very purpose of the ARC AGI benchmarks is to find a pure reasoning task that humans are very good at and AI is very bad at. Companies then race each other to get a high score on that benchmark. Sure there’s going to be a lot of “studying for the test” and benchmaxing, but once a benchmark gets close to being saturated, ARC releases a new benchmark with a new task the AI is terrible at. This will rinse and repeat till ARC can find no reasoning task that AI cannot do that a human could. At that point we will effectively have AGI. I believe the CEO of ARC has said they expect us to get to ARC-AGI-7 before declaring AGI. | ||