| ▲ | BoorishBears 5 hours ago |
| This guy had a terrible broken benchmark that gets hawked every release, and I wish HN would ban accounts that essentially exist to hawk a personally owned site, especially such a bad one. |
|
| ▲ | pbgcp2026 39 minutes ago | parent | next [-] |
| I get similar results in my own tests. And Gemini 3.1 Pro is consistently on top of my ratings. Not everyone is coding monkey, I prefer staying a programmer. |
| |
| ▲ | BoorishBears 15 minutes ago | parent [-] | | They're referencing Gemini 3.5 Flash being the top model, you must not be great with detail. And no (strong) programmer would jump to assuming other people are coding monkeys just because they disagree with what a strong LLM is: that's the kind of thinking reserved for the glorified coding monkeys who wasted their life getting better at writing CRUD apps and are now upset that someone's tooling is dropping the already very low bar there. |
|
|
| ▲ | UqWBcuFx6NV4r 5 hours ago | parent | prev [-] |
| If you were right, the karma system would largely take care of this. It really sounds like this is more of your personal view |
| |
| ▲ | BoorishBears 4 hours ago | parent [-] | | Karma systems are never perfect, and most people will not assume this is a pattern. (ie. won't feel the need to downvote them just for having yet another crappy AI benchmark) I only recognize it because I build a product that leaves me looking for information on every major release... and every major release a new crop of folks reply confused about the anomalies on top of anomalies that they're seeing, and they slowly learn this person is just way more unserious than the dogged distribution would imply. |
|