Remix.run Logo
girvo 2 hours ago

Yeah I'm quite surprised as to how all of those are supposed to be considered problems. They all make sense to me if we're trying to judge whether these tools are AGI, no?

andy12_ 2 hours ago | parent | next [-]

I think that any logic-based test that your average human can "fail" (aka, score below 50%) is not exactly testing for whether something is AGI or not. Though I suppose it depends on your definition of AGI (and whether all humans, or at least your average human, is considered AGI under that definition).

chillfox 10 minutes ago | parent [-]

If I had a puzzle I really needed solved, then I would not ask a rando on the street, I would ask someone I know is really good at puzzles.

My point is: For AGI to be useful, it really should be able to perform at the top 10% or better level for as many professions as possible (ideally all of them).

An AI that can only perform at the average human level is useless unless it can be trained for the job like humans can.

benjaminl an hour ago | parent | prev [-]

This issue here is that people have different definitions of AGI. From the description. Getting 100% on this benchmark would be more than AGI and would qualify for ASI (Algorithmic Super Intelligence) not just AGI.

fc417fc802 a few seconds ago | parent | next [-]

If you only outdo humans 50% of the time you're never going to get consensus on if you've qualified. Whereas outdoing 90% of humans on 90% of all the most difficult tasks we could come up with is going to be difficult to argue against.

This benchmark is only one such task. After this one there's still the rest of that 90% to go.

Beating humans isn't anywhere near sufficient to qualify as ASI. That's an entirely different league with criteria that are even more vague.

foltik 19 minutes ago | parent | prev | next [-]

I’d be hesitant to call that ASI if it’s pretty obvious how you’d write a regular old program to solve it.

throwuxiytayq 31 minutes ago | parent | prev [-]

People are still debating whether these models exhibit any kind of intelligence and any kind of thinking. Setting the bar higher then necessary is welcome, but at this point I’m pretty sure everyone’s opinions are set in stone.