| ▲ | omega3 6 hours ago | |
Benchmarks suggests they are comparable: https://artificialanalysis.ai/?models=claude-opus-4-6-adapti... But let's say for the sake of discussion Opus is much better - still doesn't justify the price disparity especially when considering that other models are provided by commercial inference providers and anthropics is inhouse. | ||
| ▲ | cbg0 5 hours ago | parent | next [-] | |
Try doing real work with them, it's night and day difference especially for systems programming. The non-frontier models to a lot of benchmaxxing to look good. | ||
| ▲ | xienze 6 hours ago | parent | prev [-] | |
> Benchmarks suggests they are comparable The problem here is people think AI benchmarks are analogous to say, CPU performance benchmarks. They're not: * You can't control all the variables, only one (the prompt). * The outputs, BY DESIGN, can fluctuate wildly for no apparent reason (i.e., first run, utter failure, second run, success). * The biggest point, once a benchmark is known, future iterations of the model will be trained on it. Trying to objectively measure model performance is a fool's errand. | ||