| ▲ | andai 19 hours ago | |
If you look at benchmarks, the Claude models score significantly higher intelligence per token. I'm not sure how that works exactly, but they are offset from the entire rest of the chart on that metric. It seems they need less tokens to get the same result. (I can't speak for how that affects performance on very difficult tasks though, since most of mine are pretty straightforward.) So if you look at the total cost of running the benchmark, it's surprisingly similar to other models -- the higher price per token is offset by the significantly fewer tokens required to complete a task. See "Cost to Run Artificial Analysis Index" and "Intelligence vs Output Tokens" here https://artificialanalysis.ai/ ...With the obligatory caveat that benchmarks are largely irrelevant for actual real world tasks and you need to test the thing on your actual task to see how well it does! | ||