I agree completely. I haven't noticed much improvement in coding ability in the last year. I'm using frontier models.

What's been the game changer are tools like Claude Code. Automatic agentic tool loops purpose built for coding. This is what I have seen as the impetus for mainstream adoption rather than noticeable improvements in ability.

▲

sho_hn 15 hours ago | parent | next [-]

My anecdotal experience is rather different.

I write a lot of C++ and QML code. Codex 5.3, only released in Feb, is the the first model I've used that would regularly generate code that passes my 25 years expert smell test and has turned generative coding from a timesap/nuisance into a tool I can somewhat rely on not to set me back.

Claude still wasn't quite there at the time, but I haven't tried 4.6 yet.

QML is a declarative-first markup language that is a superset of the JavaScript syntax. It's niche and doesn't have a giant amount of training data in the corpus. Codex 5.3 is the first model that doesn't super botch it or prefers to write reams of procedural JS embeds (yes, after steering). Much reduced is also the tendency to go overboard on spamming everything with clouds of helper functions/methods in both C++ and QML. It knows when to stop, so to speak, and is either trained or able to reason toward a more idiomatic ideal, with far less explicit instruction / AGENTS.md wrangling.

It's a huge difference. It might be the result of very specific optimization, or perhaps simultaneous advancements in the harness play a bigger role, but in my books my kneck of the woods (or place on the long tail) only really came online in 2026 as far as LLMs are concerned.

	▲	rubymamis 11 hours ago \| parent [-]
		As a Qt C++ and QML developer myself[1], Opus 4.6 thinking is much better than any other model I've tested (Codex 5.3/GPT 5.4/Gemini 3.1 Pro). [1] https://rubymamistvalove.com/block-editor

▲

mavamaarten 15 hours ago | parent | prev [-]

Maybe n=1, but I disagree? I notice that Sonnet 4.6 follows instructions much better than 4.5 and it generates code much closer to our already in-place production code.

It's just a point release and it isn't a significant upgrade in terms of features or capabilities, but it works... better for me.

	▲	ryanackley 14 hours ago \| parent [-]
		Are you using a tool like Claude Code or Codex or windsurf? I ask because I've found their ability to pull in relevant context improves tasks in exactly the way you're describing. My own experience is that some things get better and some things get worse in perceived quality at the micro-level on each point release. i.e. 4.5->4.6