| ▲ | make3 8 hours ago |
| absolutely not on par you're smoking |
|
| ▲ | dkhenry 7 hours ago | parent | next [-] |
| You make a compelling argument, but thankfully I have data to back up my anecdotal experience This comparison shows them neck and neck
https://benchlm.ai/compare/claude-sonnet-4-5-vs-gemma-4-31b As Does this one
https://llm-stats.com/models/compare/claude-sonnet-4-6-vs-ge... And the pelican benchmark even shows them pretty close
https://simonwillison.net/2026/Apr/2/gemma-4/
https://simonwillison.net/2025/Sep/29/claude-sonnet-4-5/ Also this isn't a fringe statement, you can see most people who have done an evaluation agree with me |
| |
| ▲ | jmward01 6 hours ago | parent | next [-] | | I think one area I find hard to get around is context length. Everything self hosted is so limited on length that it is marginal to use. Additionally I think that the tools (like claude code) are clearly in the training mix for Anthropic's models so they seem to get a boost over other models pushed into that environment. That being said, open source and local inference is -really- good and only going to get better. There is no doubt that the current frontier biz model is not sustainable. | |
| ▲ | make3 an hour ago | parent | prev [-] | | if you look at the details of the numbers of the benchmarks that you shared, Sonnet 4.5 crushes gemma 4. Somehow the first link doesn't run Sonnet on the multi modal benchmark, that's why the top score looks close, it beats Gemma at every benchmark they actually ran. The arena in the second shows that it actually destroys Gemma 4 as well, not close |
|
|
| ▲ | lostmsu 8 hours ago | parent | prev [-] |
| Just to be clear, did you notice the parent said 4.5? |
| |
| ▲ | cmorgan31 7 hours ago | parent | next [-] | | They are also on par in a lot of classification tasks. I did have to actually use gemma4 and fine tune it a bit but that is part of the value add. | |
| ▲ | make3 an hour ago | parent | prev [-] | | I did, what's your point? |
|