Remix.run Logo
lukev 3 hours ago

I'm not sure how this relates to AGI.

This measures the ability of a LLM to succeed in a certain class of games. Sure, that could be a valuable metric on how powerful (or even generally powerful) a LLM is.

Humans may or may not be good at the same class of games.

We know there exists a class of games (including most human games like checkers/chess/go) that computers (not LLMs!) already vastly outpace humans.

So the argument for whether a LLM is "AGI" or not should not be whether a LLM does well on any given class of games, but whether that class of games is representative of "AGI" (however you define that.)

Seems unlikely that this set of games is a definition meaningful for any practical, philosophical or business application?

piiritaja 2 hours ago | parent | next [-]

It's to do with how the creators of ARC-AGI defined intelligence. Chollet has said he thinks intelligence is how well you can operate in situations you have not encountered before. ARC-AGI measures how well LLMs operate in those exact situations.

Keyframe 11 minutes ago | parent [-]

To an extent, yes. Interdependent variables discovery and then hopefully systems modeling and navigating through such a system. If that's the case, then this is a simplistic version of it. How long until tests will involve playing a modern Zelda with quests and sidequests?

imiric 2 hours ago | parent | prev [-]

"AGI" is a marketing term, and benchmarks like this only serve to promote relative performance improvements of "AI" tools. It doesn't mean that performance in common tasks actually improves, let alone that achieving 100% in this benchmark means that we've reached "AGI".

So there is a business application, but no practical or philosophical one.