| ▲ | alfalfasprout 2 hours ago | |
I'm actually currently studying this :) Honestly... not that dramatically. Each release is much more marginal. And quoted official benchmarks doesn't translate very well into the real world. 4.7 regressed hard in some ways. But a compounding factor too is that the claude code harness seems to nerf the model after a few months. Probably to reduce token use. So far 4.8 seems less verbose but we'll see in practice what it translates into meaningfully. | ||