| ▲ | Veedrac 5 hours ago | |
> From what I've seen, models have hit a plateau where code generation is pretty good... > But it's not improving like it did the past few years. As opposed to... what? The past few months? Has AI progress so broken our minds as to make us stop believing in the concept of time? | ||
| ▲ | martinald 5 hours ago | parent | next [-] | |
Yes a strange comment. Opus 4.5 is significantly better than before and Opus 4.6 is even better. Same with the 5.2 and 5.3 Codex models. If anything, the pace has increased. This may be one of the most important graphs to keep an eye on: https://metr.org/ and it tracks well to my anecdotal experience. You can see the industry did hit a bit of a wall in 2024 where the improvements drop below the log trend. However, in 2025 the industry is significantly _above_ the trend line. | ||
| ▲ | mkozlows 4 hours ago | parent | prev | next [-] | |
The wild thing is, that "plateau" link is from September 2025, aka two months before Opus 4.5. Yeah, it's not a plateau. | ||
| ▲ | Aurornis 4 hours ago | parent | prev | next [-] | |
I see these claims in a lot of anti-LLM content, but I’m equally puzzled. The pace of progress feels very fast right now. There is some desire to downplay or dismiss it all, as if the naysayers are going to get their “told you so” moment and it’s just around the corner. Yet the goalposts for that moment just keep moving with each new release. It’s sad that this has turned into a culture war where you’re supposed to pick a side and then blind yourself to any evidence that doesn’t support your chosen side. The vibecoding maximalists do the same thing on the other side of this war, but it’s getting old on both sides. | ||
| ▲ | hattmall 4 hours ago | parent | prev [-] | |
I mean if you take now, from a year ago, vs a year ago from two years ago and then once more vs two years ago to three years ago, you wouldn't see the idea of a plateau in effectiveness or not? I still have several projects I developed in mid 2024 where I felt the AI was really close but not quite good enough for production, and almost two years in they haven't gotten appreciably better to where I would be able to release an actual application. | ||