Remix.run Logo
saberience 2 hours ago

So this is another ARC-"AGI" benchmark which is again designed around using eyesight for LLMs which are trained to be great at text, what is the point?

Yes, we get that LLMs are really bad when you give them contrived visual puzzles or pseudo games to solve... Well great, we already knew this.

The "hype" around the ARC-AGI benchmarks makes me laugh, especially the idea we would have AGI when ARC-AGI-1 was solved... then we got 2, and now we're on 3.

Shall we start saying that these benchmarks have nothing to do with AGI yet? Are we going to get an ARC-AGI-10 where we have LLMs try and beat Myst or Riven? Will we have AGI then?

This isn't the right tool for measuring "AGI", and honestly I'm not sure what it's measuring except the foundation labs benchmaxxing on it.