| ▲ | shadeslayer_ 5 hours ago | |
Do these benchmarks even add any value at this point? This one is basically Cursor saying that their model is as good as the frontier ones at a fraction of the price. The independent benchmarks are probably part of training data now and the models are pattern-matching against them all the time. The final test of a model (and the harness, probably) is how good it works FOR YOU - since most of the models can pretty much do most of our tasks on a daily basis - it boils down to which one has the least friction to its usage. | ||