Remix.run Logo
Bjorkbat a day ago

I was about to comment on a past remark from Anthropic that the whole reason for the convoluted naming scheme was because they wanted to wait until they had a model worth of the "Claude 4" title.

But because of all the incremental improvements since then, the irony is that this merely feels like an incremental improvement. It obviously is a huge leap when you consider that the best Claude 3 ever got on SWE-verified was just under 20% (combined with SWE-agent), but compared to Claude 3.7 it doesn't feel like that big of a deal, at least when it comes to SWE-bench results.

Is it worthy? Sure, why not, compared to the original Claude 3 at any rate, but this habit of incremental improvement means that a major new release feels kind of ordinary.