| ▲ | michaelbuckbee 15 hours ago | ||||||||||||||||
I was trying to get a better sense of the time cost quality matrix of these, so I threw together a quick eval of Sonnet 4.6, Mistral's dev model, and Opus 4.7 (figuring it's what you'd use if you were on Max). The results for a function implementation and test of levenshtein distance in js are pretty similar but Mistral is 30x cheaper than Opus 4.7 and 4x faster than Sonnet 4.6. | |||||||||||||||||
| ▲ | kaoD 5 hours ago | parent | next [-] | ||||||||||||||||
But that's not very informative. Levenshtein distance is not only a well-understood problem, it's small, self-contained, and extremely well-represented in the training data. The kind of problem where even small/bad models can excel. The golden standard for those tasks is just "use a library" so no wonder the beefy models are expensive: you're chartering a commercial airplane to go grocery shopping. My personal benchmarks are software engineering tasks (ideally spanning multiple packages in a monorepo) composed of many small decisions that, compounded, make or break the implementation and long-term maintainability. There's where even frontier models struggle, which makes comparisons meaningful. | |||||||||||||||||
| |||||||||||||||||
| ▲ | KronisLV 14 hours ago | parent | prev [-] | ||||||||||||||||
The one detail I did forget to mention is that if anyone goes with the Mistral subscription (instead of paying per-token), then the Mistral Vibe tool gives you their Medium 3.5 model by default, with a 200k token context. It will probably be enough for plenty of tasks, though there's also a noticeable difference between that and up to 1M. | |||||||||||||||||