| ▲ | XCSme 2 hours ago | |||||||
Gets 10/10 on my potato benchmarks: https://aibenchy.com/model/google-gemini-3-1-pro-preview-med... | ||||||||
| ▲ | XCSme 3 minutes ago | parent | next [-] | |||||||
Added one more test, which surprisingly gemini flash 3 reasoning passes, but gemini 3.1 pro not | ||||||||
| ▲ | XCSme 2 hours ago | parent | prev | next [-] | |||||||
Now I need to write more tests. It's a bit hard to trick reasoning models, because they explore a lot of the angles of a problem, and they might accidentally have an "a-ha" moment that leads them on the right path. It's a bit like doing random sampling and stumbling upon the right result after doing gradient descent from those points. | ||||||||
| ▲ | thevinter 17 minutes ago | parent | prev [-] | |||||||
Are you intentionally keeping the benchmarks private? | ||||||||
| ||||||||