Remix.run Logo
causal 2 hours ago

Thanks, I mostly agree with your approach except for one thing: eyesight feels like a "harness" that humans get to use and LLMs do not.

I'm guessing you did not pass the human testers JSON blobs to work with, and suspect they would also score 0% without the eyesight and visual cortex harness to their reasoning ability.

fchollet 2 hours ago | parent | next [-]

I'm all for testing humans and AI on a fair basis; how about we restrict testing to robots physically coming to our testing center to solve the environments via keyboard / mouse / screen like our human testers? ;-)

(This version of the benchmark would be several orders of magnitude harder wrt current capabilities...)

causal 2 hours ago | parent [-]

Well, yes, and would hand even more of an advantage to humans. My point is that designing a test around human advantages seems odd and orthogonal to measuring AGI.

adgjlsfhk1 an hour ago | parent [-]

The whole point of AGI is "general" intelligence, and for that intelligence to be broadly useful it needs to exist within the context of a human centric world

causal 43 minutes ago | parent [-]

Then why deny it a harness it can also use in a human centric world?

fc417fc802 2 hours ago | parent | prev | next [-]

The human testers were provided with their customary inputs, as were the LLMs. I don't see the issue.

I guess it could be interesting to provide alternative versions that made available various representations of the same data. Still, I'd expect any AGI to be capable of ingesting more or less any plaintext representation interchangeably.

causal an hour ago | parent [-]

The issue is that ARC AGI 3 specifically forbids harnesses that humans get to use.

2 hours ago | parent | prev [-]
[deleted]