Remix.run Logo
Donald 2 days ago

Gemini 3 Pro Preview gets 96.8% on the same benchmark? That's impressive

capitainenemo 2 days ago | parent | next [-]

And performs very well on the latest 100 puzzles too, so isn't just learning the data set (unless I guess they routinely index this repo).

I wonder how well AIs would do at bracket city. I tried gemini on it and was underwhelmed. It made a lot of terrible connections and often bled data from one level into the next.

wooger a day ago | parent [-]

> unless I guess they routinely index this repo

This sounds like exactly the kind of thing any tech company would do when confronted with a competitive benchmark.

rsanek a day ago | parent [-]

I mean, the repo has <200 stars, it's not like it's so mainstream that you'd expect LLM makers to be watching it actively. If they wanted to game it, they could more easily do that in RL with synthetic data anyway.

bigyabai 2 days ago | parent | prev [-]

GPT-5.2 might be Google's best Gemini advertisement yet.

outside1234 2 days ago | parent [-]

Especially when you see the price