| ▲ | dkhenry 6 hours ago | |
You make a compelling argument, but thankfully I have data to back up my anecdotal experience This comparison shows them neck and neck https://benchlm.ai/compare/claude-sonnet-4-5-vs-gemma-4-31b As Does this one https://llm-stats.com/models/compare/claude-sonnet-4-6-vs-ge... And the pelican benchmark even shows them pretty close https://simonwillison.net/2026/Apr/2/gemma-4/ https://simonwillison.net/2025/Sep/29/claude-sonnet-4-5/ Also this isn't a fringe statement, you can see most people who have done an evaluation agree with me | ||
| ▲ | jmward01 4 hours ago | parent [-] | |
I think one area I find hard to get around is context length. Everything self hosted is so limited on length that it is marginal to use. Additionally I think that the tools (like claude code) are clearly in the training mix for Anthropic's models so they seem to get a boost over other models pushed into that environment. That being said, open source and local inference is -really- good and only going to get better. There is no doubt that the current frontier biz model is not sustainable. | ||