While there's not a lot of meat on the bone for this post, one section of it reflects the overall problem with the idea of Claude-as-everything:

> I spent weeks casually trying to replicate what took years to build. My inability to assess the complexity of the source material was matched by the inability of the models to understand what it was generating.

When the trough of disillusionment hits, I anticipate this will become collective wisdom, and we'll tailor LLMs to the subset of uses where they can be more helpful than hurtful. Until then, we'll try to use AI to replace in weeks what took us years to build.

▲

samdjstephens 14 hours ago | parent | next [-]

If LLMs stopped improving today I’m sure you would be correct- as it is I think it’s very hard to predict what the future holds and where the advancements take us.

I don’t see a particularly good reason why LLMs wouldn’t be able to do most programming tasks, with the limitation being our ability to specify the problem sufficiently well.

▲

maccard 14 hours ago | parent | next [-]

I feel like we’ve been hearing this for 4 years now. The improvements to programming (IME) haven’t come from improved models, they’ve come from agents, tooling, and environment integrations.

▲

bigiain 5 hours ago | parent | next [-]

> I feel like we’ve been hearing this for 4 years now.

I feel we were hearing very similar claims 40 years ago, about how the next version of "Fourth Generation Languages" were going to enable business people and managers to write their own software without needing pesky programmers to do it for them. They'll "just" need to learn how to specify the problem sufficiently well.

(Where "just" is used in it's "I don't understand the problem well enough to know how complicated or difficult what I'm about to say next is" sense. "Just stop buying cigarettes, smoker!", "Just eat less and exercise more, fat person!", "Just get a better paying job, poor person!", "Just cheer up, depressed person!")

▲

dwohnitmok 4 hours ago | parent | prev | next [-]

> The improvements to programming (IME) haven’t come from improved models, they’ve come from agents, tooling, and environment integrations.

I disagree. This almost entirely model capability increases. I've stated this elsewhere: https://news.ycombinator.com/item?id=46362342

Improved tooling/agent scaffolds, whatever, are symptoms of improved model capabilities, not the cause of better capabilities. You put a 2023-era model such as GPT-4 or even e.g. a 2024-era model such as Sonnet 3.5 in today's tooling and they would crash and burn.

The scaffolding and tooling for these models have been tried ever since GPT-3 came out in 2020 in different forms and prototypes. The only reason they're taking off in 2025 is that models are finally capable enough to use them.

▲

elAhmo 13 hours ago | parent | prev [-]

Both is true, models have also been significantly improved in the last year alone, let's not even talk about 4 years ago. Agents, tooling and other sugar on top is just that - enabling more efficient and creative usage, but let's not undermine how much better models today are compared to what was available in the past.

▲

majormajor 10 hours ago | parent | next [-]

How do you judge model improvements vs tooling improvements?

If not working at one of the big players or running your own, it appears that even the APIs these days are wrapped in layers of tooling and abstracting raw model access more than ever.

	▲	dwohnitmok 4 hours ago \| parent [-]
		> even the APIs these days are wrapped in layers of tooling and abstracting raw model access more than ever. No, the APIs for these models haven't really changed all that much since 2023. The de facto standard for the field is still the chat completions API that was released in early 2023. It is almost entirely model improvements, not tooling improvements that are driving things forward. Tooling improvements are basically entirely dependent on model improvements (if you were to stick GPT-4, Sonnet 3.5, or any other pre-2025 model in today's tooling, things would suck horribly).

▲

fragmede 10 hours ago | parent | prev [-]

The code that's generated when given a long leash is still crap. But damned if I didn't use a JIRA mcp and a gitlab mcp, and just have the corporate AI just "do" a couple of well defined and well scoped tickets, including interacting with JIRA to get the ticket contents, update its progress, push to gitlab, and open an MR. Then, the corporate CodeRabbit does a first pass code review against the code so any glaring errors are stomped out before a human can review it. What's more scary though is that the JIRA tickets were created by a design doc that was half AI generated in the first place. The human proposed something, the AI asked clarifying questions, then broke the project down into milestones and then tickets, and then created the epic and issues on JIRA. One of my tradie friends taking an HVAC class tells me that there are a couple of programmers in his class looking to switch careers. I don't know what the future brings, but those programmers (sorry, "software developers") may have the right idea.

	▲	llmslave2 4 hours ago \| parent [-]
		Yes we get it, there is a ton of "work" being done in corporate environments, in which the slop that generative AI churns out is similar to the slop that humans churn out. Congrats.

▲

PaulRobinson 13 hours ago | parent | prev | next [-]

LLM capability improvement is hitting a plateau with recent advancements mostly relying on accessing context locally (RAG), or remotely (MCP), with a lot of extra tokens (read: drinking water and energy), being spent prompting models for "reasoning". Foundation-wise, observed improvements are incremental, not exponential.

> able to do most programming tasks, with the limitation being our ability to specify the problem sufficiently well

We've spent 80 years trying to figure that out. I'm not sure why anyone would think we're going to crack this one anytime in the next few years.

	▲	eru 12 hours ago \| parent [-]
		> Foundation-wise, observed improvements are incremental, not exponential. Incremental gains are fine. I suspect capability of models scales roughly as the logarithm of their training effort. > (read: drinking water and energy) Water is not much of a concern in most of the world. And you can cool without using water, if you need to. (And it doesn't have to be drinking water anyway.) Yes, energy is a limiting factor. But the big sink is in training. And we are still getting more energy efficient. At least to reach any given capability level; of course in total we will be spending more and more energy to reach ever higher levels.

▲

majormajor 10 hours ago | parent | prev [-]

> the limitation being our ability to specify the problem sufficiently well

Such has always been the largest issue with software development projects, IMO.

▲

tracker1 16 hours ago | parent | prev [-]

I would think/hope that the code assist LLMs would be optimizing towards supportable/legible code solutions overall. Mostly in that they can at least provide a jumping off point, largely accepting that they more often than not won't be able to produce complete, finished solutions entirely.