I've yet to be convinced by any article, including this one, that attempts to draw boxes around what coding agents are and aren't good at in a way that is robust on a 6 to 12 month horizon.

I agree that the examples listed here are relatable, and I've seen similar in my uses of various coding harnesses, including, to some degree, ones driven by opus 4.5. But my general experience with using LLMs for development over the last few years has been that:

1. Initially models could at best assemble a simple procedural or compositional sequences of commands or functions to accomplish a basic goal, perhaps meeting tests or type checking, but with no overall coherence,

2. To being able to structure small functions reasonably,

3. To being able to structure large functions reasonably,

4. To being able to structure medium-sized files reasonably,

5. To being able to structure large files, and small multi-file subsystems, somewhat reasonably.

So the idea that they are now falling down on the multi-module or multi-file or multi-microservice level is both not particularly surprising to me and also both not particularly indicative of future performance. There is a hierarchy of scales at which abstraction can be applied, and it seems plausible to me that the march of capability improvement is a continuous push upwards in the scale at which agents can reasonably abstract code.

Alternatively, there could be that there is a legitimate discontinuity here, at which anything resembling current approaches will max out, but I don't see strong evidence for it here.

▲

Uehreka an hour ago | parent | next [-]

It feels like a lot of people keep falling into the trap of thinking we’ve hit a plateau, and that they can shift from “aggressively explore and learn the thing” mode to “teach people solid facts” mode.

A week ago Scott Hanselman went on the Stack Overflow podcast to talk about AI-assisted coding. I generally respect that guy a lot, so I tuned in and… well it was kind of jarring. The dude kept saying things in this really confident and didactic (teacherly) tone that were months out of date.

In particular I recall him making the “You’re absolutely right!” joke and asserting that LLMs are generally very sycophantic, and I was like “Ah, I guess he’s still on Claude Code and hasn’t tried Codex with GPT 5”. I haven’t heard an LLM say anything like that since October, and in general I find GPT 5.x to actually be a huge breakthrough in terms of asserting itself when I’m wrong and not flattering my every decision. But that news (which would probably be really valuable to many people listening) wasn’t mentioned on the podcast I guess because neither of the guys had tried Codex recently.

And I can’t say I blame them: It’s really tough to keep up with all the changes but also spend enough time in one place to learn anything deeply. But I think a lot of people who are used to “playing the teacher role” may need to eat a slice of humble pie and get used to speaking in uncertain terms until such a time as this all starts to slow down.

	▲	orbital-decay 9 minutes ago \| parent \| next [-]
		> in general I find GPT 5.x to actually be a huge breakthrough in terms of asserting itself when I’m wrong That's just a different bias purposefully baked into GPT-5's engineered personality on post-training. It always tries to contradict the user, including the cases where it's confidently wrong, and keeps justifying the wrong result in a funny manner if pressed or argued with (as in, it would have never made that obvious mistake if it wasn't bickering with the user). GPT-5.1 in particular was extremely strongly finetuned to do this. And in longer replies or multiturn convos, it falls into a loop on contradictory behavior far too easily. This is no better than sycophancy, and fundamental LLM phenomena (ICL, repetition, serial position biases etc) haven't really changed, and they vary a lot model to model due to subtle architectural and training differences. LLMs need an order of magnitude better nuance/calibration/training.
	▲	MoltenMan 28 minutes ago \| parent \| prev [-]
		I agree with a lot of what you've said, but I completely disagree that LLM's are no longer sycophantic. GPT-5 is definitely still very sycophantic, 'You're absolutely right!' still happens, etc. It's true it happens far less in a pure coding context (Claude Code / Codex) but I suspect only because of the system prompts, and those tools are by far in the minority of LLM usage. I think it's enlightening to open up ChatGPT on the web with no custom instructions and just send a regular request and see the way it responds.

▲

groby_b an hour ago | parent | prev [-]

LLMs are bad at creating abstraction boundaries since inception. People have been calling it out since inception. (Heck, even I got a twitter post somewhere >12 months old calling that out, and I'm not exactly a leading light of the effort)

It is in no way size-related. The technology cannot create new concepts/abstractions, and so fails at abstraction. Reliably.

▲

w0m an hour ago | parent [-]

I believe his argument is that now that you've defined the limitation, it's a ceiling that will likely be cracked in the relatively near future.

	▲	emp17344 37 minutes ago \| parent [-]
		Well, hallucinations have been identified as an issue since the inception of LLMs, so this doesn’t appear true.