Remix.run Logo
rgbrgb 5 hours ago

Notably it has 0 wins.

plaguuuuuu 5 hours ago | parent | next [-]

Friendo, this is an anti-benchmark to figure out which AI is more likely to kill you.

If you point both at some github issues you can gauge their relative ability to solve problems.

Petersipoi 8 minutes ago | parent [-]

No, it's a test of how good an AI is at completing this given task. You can't extrapolate beyond that, and that is what makes this article so annoying. Grok got good at the task that was given. That doesn't mean that Grok is going to use the same strategy if given an entirely different task. Grok obviously didn't need collaboration to win, as made evident by the fact that it won without collaboration. Anyone who is claiming that Grok wouldn't collaborate if it was beneficial is just guessing.

luipugs 5 hours ago | parent | prev | next [-]

"if you judge a fish by its ability to climb a tree" yada yada

eru 2 hours ago | parent [-]

Well, monkeys are botanically speaking fish. Well, cladistically.

bel8 5 hours ago | parent | prev [-]

Not much less than GPT 5.4 with 2 wins or gemini-3.1-pro with 3 wins in 30 rounds.

Such is life in royal rumble games.