| ▲ | lambda 3 hours ago | |||||||
Which Opus? They certainly outperform Claude 3 Opus. Anyhow, feel free to try them out head to head on OpenRouter. I'd love to see someone write up their results, of a modern local sized open source model vs. frontier models from ~a year ago, on something other than the standard benchmarks. | ||||||||
| ▲ | mapontosevenths 2 hours ago | parent | next [-] | |||||||
There's a guy on Youtube named Bijan Bowen who tests all the models (open and frontier) on a series of one/few shot programming exercises and has been for a long while now. You can pretty much watch him compare the results for any two models you're likely to be interested in. I'm not affiliated, I just like his style and have found it handy. I know it's not very rigorous, but it's good enough for me and I've found his examples to pretty closely match the results I see in real life. | ||||||||
| ||||||||
| ▲ | MrScruff 3 hours ago | parent | prev [-] | |||||||
I’m normally comparing frontier open/cheap models against frontier closed source. I use deepseek/glm regularly, they’re fine and you can get real work done with them but it’s super obvious when you switch back to opus or even sonnet. A 3B active param MoE model is not comparable. | ||||||||
| ||||||||