| ▲ | kqr 8 hours ago | ||||||||||||||||||||||
It was never that great, it seems. For all of 2025 there was virtually no improvement in the rate at which models produced quality code. They only got better at passing automated tests. | |||||||||||||||||||||||
| ▲ | civvv 5 hours ago | parent | next [-] | ||||||||||||||||||||||
This is likely true. I think model quality has stagnated and that its likely a non-trivial task to find a new improvement vector. Scaling the width of the model (which has been the driving force behind the speed of improvement thus far) seems to have reached its limit. It will be interesting to see the implications of this. Tooling can only do so much in the long term. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | cjsaltlake 5 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
But, that's an enormous source of coding productivity, and it's why Anthropic is worth billions... The reason SWE-bench has been so successful and useful for coding is that software engineering has a ton of tradition and infrastructure for making and using automated tests. | |||||||||||||||||||||||
| ▲ | greenchair an hour ago | parent | prev [-] | ||||||||||||||||||||||
maybe this is why these companies pricing plans are getting more limited and expensive.. | |||||||||||||||||||||||