my issue hasn't been for a long time now that the code they write works or doesn't work. My issues all stem from that it works, but does the wrong thing

▲ zmmmmm 3 days ago | parent | next [-]

> My issues all stem from that it works, but does the wrong thing

It's an opportunity, not a problem. Because it means there's a gap in your specifications and then your tests.

I use Aider not Claude but I run it with Anthropic models. And what I found is that comprehensively writing up the documentation for a feature spec style before starting eliminates a huge amount of what you're referring to. It serves a triple purpose (a) you get the documentation, (b) you guide the AI and (c) it's surprising how often this helps to refine the feature itself. Sometimes I invoke the AI to help me write the spec as well, asking it to prompt for areas where clarification is needed etc.

▲

giancarlostoro 3 days ago | parent [-]

This is how Beads works, especially with Claude Code. What I do is I tell Claude to always create a Bead when I tell it to add something, or about something that needs to be added, then I start brainstorming, and even ask it to do market research what are top apps doing for x, y or z. Then ask it to update the bead (I call them tasks) and then finally when its got enough detail, I tell it, do all of these in parallel.

	▲	beoberha 3 days ago \| parent [-]
		Beads is amazing. It’s such a simple concept but elevates agentic coding to another levels

▲ simonw 3 days ago | parent | prev | next [-]

If it does the wrong thing you tell it what the right thing is and have it try again.

With the latest models if you're clear enough with your requirements you'll usually find it does the right thing on the first try.

▲ GoatInGrey 3 days ago | parent | next [-]

There are several rubs with that operating protocol extending beyond the "you're holding it wrong" claim.

1) There exists a threshold, only identifiable in retrospect, past which it would have been faster to locate or write the code yourself than to navigate the LLM's correction loop or otherwise ensure one-shot success.

2) The intuition and motivations of LLMs derive from a latent space that the LLM cannot actually access. I cannot get a reliable answer on why the LLM chose the approaches it did; it can only retroactively confabulate. Unlike human developers who can recall off-hand, or at least review associated tickets and meeting notes to jog their memory. The LLM prompter always documenting sufficiently to bridge this LLM provenance gap hits rub #1.

3) Gradually building prompt dependency where one's ability to take over from the LLM declines and one can no longer answer questions or develop at the same velocity themselves.

4) My development costs increasingly being determined by the AI labs and hardware vendors they partner with. Particularly when the former will need to increase prices dramatically over the coming years to break even with even 2025 economics.

▲

simonw 3 days ago | parent | next [-]

The value I'm getting from this stuff is so large that I'll take those risks, personally.

▲

th0ma5 3 days ago | parent [-]

Glad you found a way to be unfalsifiable! Lol

▲

scubbo 3 days ago | parent [-]

Many people - simonw is the most visible of them, but there are countless others - have given up trying to convinced folks who are determined to not be convinced, and are simply enjoying their increased productivity. This is not a competition or an argument.

▲

llmslave2 3 days ago | parent [-]

Maybe they are struggling to convince others because they are unable to produce evidence that is able to convince people?

My experience scrolling X and HN is a bunch of people going "omg opus omg Claude Code I'm 10x more productive" and that's it. Just hand wavy anecdotes based on their own perceived productivity. I'm open to being convinced but just saying stuff is not convincing. It's the opposite, it feels like people have been put under a spell.

I'm following The Primeagen, he's doing a series where he is trying these tools on stream and following peoples advice on how to use them the best. He's actually quite a good programmer so I'm eager to see how it goes. So far he isn't impressed and thus neither am I. If he cracks it and unlocks significant productivity then I will be convinced.

▲

enraged_camel 3 days ago | parent [-]

>> Maybe they are struggling to convince others because they are unable to produce evidence that is able to convince people?

Simon has produced plenty of evidence over the past year. You can check their submission history and their blog: https://simonwillison.net/

The problem with people asking for evidence is that there's no level of evidence that will convince them. They will say things like "that's great but this is not a novel problem so obviously the AI did well" or "the AI worked only because this is a greenfield project, it fails miserably in large codebases".

▲

llmslave2 3 days ago | parent [-]

It's true that some people will just continually move the goalposts because they are invested in their beliefs. But that doesn't mean that the skepticism around certain claims aren't relevant.

Nobody serious is disputing that LLM's can generate working code. They dispute claims like "Agentic workflows will replace software developers in the short to medium term", or "Agentic workflows lead to 2-100x improvements in productivity across the board". This is what people are looking for in terms of evidence and there just isn't any.

Thus far, we do have evidence that AI (at least in OSS) produces a 19% decrease in productivity [0]. We also have evidence that it harms our cognitive abilities [1]. Anecdotally, I have found myself lazily reaching for LLM assistance when encountering a difficult problem instead of thinking deeply about the problem. Anecdotally I also struggle to be more productive using AI-centric agents workflows in areas of expertise.

We want evidence that "vibe engineering" is actually more productive across the entire lifespan of a software project. We want evidence that it produces better outcomes. Nobody has yet shown that. It's just people claiming that because they vibe coded some trivial project, all of software development can benefit from this approach. Recently a principle engineer at Google claimed that Claude Code wrote their team's entire year's worth of work in a single afternoon. They later walked that claim back, but most do not.

I'm more than happy to be convinced but it's becoming extremely tiring to hear the same claims being parroted without evidence and then you get called a luddite when you question it. It's also tiring when you push them on it and they blame it on the model you use, and then the agent, and then the way you handle context, and then the prompts, and then "skill issue". Meanwhile all they have to show is some slop that could be hand coded in a couple hours by someone familiar with the domain. I use AI, I was pretty bullish on it for the last two years, and the combination of it simply not living up to expectations + the constant barrage of what feels like a stealth marketing campaign parroting the same thing over and over (the new model is way better, unlike the other times we said that) + the amount of absolute slop code that seems to continue to increase + companies like Microsoft producing worse and worse software as they shoehorn AI into every single product (Office was renamed to Copilot 365). I've become very sensitive to it, much in the same way I was very sensitive to the claims being made by certain VC backed webdev companies regarding their product + framework in the last few years.

I'm not even going to bring up the economic, social, and environmental issues because I don't think they're relevant, but they do contribute to my annoyance with this stuff.

[0] https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... [1] https://news.harvard.edu/gazette/story/2025/11/is-ai-dulling...

▲

lunar_mycroft 3 days ago | parent [-]

> Thus far, we do have evidence that AI (at least in OSS) produces a 19% decrease in productivity

I generally agree with you, but I'd be remiss if I didn't point out that it's plausible that the slow down observed in the METR study was at least partially due to the subjects lack of experience with LLMs. Someone with more experience performed the same experiment on themselves, and couldn't find a significant difference between using LLMs and not [0]. I think the more important point here is that programmers subjective assessment of how much LLMs help them is not reliable, and biased towards the LLMs.

[0] https://mikelovesrobots.substack.com/p/wheres-the-shovelware...

▲

llmslave2 3 days ago | parent [-]

I think we're on the same page re. that study. Actually your link made me think about the ongoing debate around IDE's vs stuff like Vim. Some people swear by IDE's and insist they drastically improve their productivity, others dismiss them or even claim they make them less productive. Sound familiar? I think it's possible these AI tools are simply another way to type code, and the differences averaged out end up being a wash.

▲

AstroBen 2 days ago | parent [-]

IDEs vs vim makes a lot of sense. AI really does feel like using an IDE in a certain way

Using AI for me absolutely makes it feel like I'm more productive. When I look back on my work at the end of the day and look at what I got done, it would be ludicrous to say it was multiple times the amount as my output pre-AI

Despite all the people replying to me saying "you're holding it wrong" I know the fix to it doing the wrong thing. Specify in more detail what I want. The problem with that is twofold:

1. How much to specify? As little as possible is the ideal, if we want to maximize how much it can help us. A balance here is key. If I need to detail every minute thing I may as well write the code myself

2. If I get this step wrong, I still have to review everything, rethink it, go back and re-prompt, costing time

When I'm working on production code, I have to understand it all to confidently commit. It costs time for me to go over everything, sometimes multiple iterations. Sometimes the AI uses things I don't know about and I need to dig into it to understand it

AI is currently writing 90% of my code. Quality is fine. It's fun! It's magical when it nails something one-shot. I'm just not confident it's faster overall

	▲	llmslave2 2 days ago \| parent [-]
		I think this is an extremely honest perspective. It's actually kind of cool that it's gotten to the point it can write most code - albeit with a lot of handholding.

▲

theshrike79 3 days ago | parent | prev | next [-]

I've said this multiple times:

This is why you use this AI bubble (it IS a bubble) to use the VC-funded AI models for dirt cheap prices and CREATE tools for yourself.

Need a very specific linter? AI can do it. Need a complex Roslyn analyser? AI. Any kind of scripting or automation that you run on your own machine. AI.

None of that will go away or suddenly stop working when the bubble bursts.

Within just the last 6 months I've built so many little utilities to speed up my work (and personal life) it's completely bonkers. Most went from "hmm, might be cool to..." to a good-enough script/program in an evening while doing chores.

Even better, start getting the feel for local models. Current gen home hardware is getting good enough and the local models smart enough so you can, with the correct tooling, use them for suprisingly many things.

▲

MarsIronPI 2 days ago | parent | next [-]

> Even better, start getting the feel for local models. Current gen home hardware is getting good enough and the local models smart enough so you can, with the correct tooling, use them for suprisingly many things.

Are there any local models that are at least somewhat comparable to the latest-and-greatest (e.g. Opus 4.5, Gemini 3), especially in terms of coding?

▲

lunar_mycroft 3 days ago | parent | prev [-]

A risk I see with this approach is that when the bubble pops, you'll be left dependent on a bunch of tools which you don't know how to maintain or replace on your own, and won't have/be able to afford access to LLMs to do it for you.

	▲	theshrike79 3 days ago \| parent \| next [-]
		The "tools" in this context are literally a few hundred lines of Python or Github CI build pipeline, we're not talking about 500kLOC massive applications. I'm building tools, not complete factories :) The AI builds me a better hammer specifically for the nails I'm nailing 90% of the time. Even if the AI goes away, I still know how the custom hammer works.
	▲	AstroBen 3 days ago \| parent \| prev \| next [-]
		I thought that initially, but I don't think the skills AI weakens in me are particularly valuable Let's say AI becomes too expensive - I more or less only have to sharpen up being able to write the language. My active recall of the syntax, common methods and libraries. That's not hard or much of a setback Maybe this would be a problem if you're purely vibe coding, but I haven't seen that work long term
	▲	baq 3 days ago \| parent \| prev [-]
		Open source models hosted by independent providers (or even yourself, which if the bubble pops will be affordable if you manage to pick up hardware on fire sales) are already good enough to explain most code.

▲

kaydub 2 days ago | parent | prev [-]

> 1) There exists a threshold, only identifiable in retrospect, past which it would have been faster to locate or write the code yourself than to navigate the LLM's correction loop or otherwise ensure one-shot success.

I can run multiple agents at once, across multiple code bases (or the same codebase but multiple different branches), doing the same or different things. You absolutely can't keep up with that. Maybe the one singular task you were working on, sure, but the fact that I can work on multiple different things without the same cognitive load will blow you out of the water.

> 2) The intuition and motivations of LLMs derive from a latent space that the LLM cannot actually access. I cannot get a reliable answer on why the LLM chose the approaches it did; it can only retroactively confabulate. Unlike human developers who can recall off-hand, or at least review associated tickets and meeting notes to jog their memory. The LLM prompter always documenting sufficiently to bridge this LLM provenance gap hits rub #1.

Tell the LLM to document in comments why it did things. Human developers often leave and then people with no knowledge of their codebase or their "whys" are even around to give details. Devs are notoriously terrible about documentation.

> 3) Gradually building prompt dependency where one's ability to take over from the LLM declines and one can no longer answer questions or develop at the same velocity themselves.

You can't develop at the same velocity, so drop that assumption now. There's all kinds of lower abstractions that you build on top of that you probably can't explain currently.

> 4) My development costs increasingly being determined by the AI labs and hardware vendors they partner with. Particularly when the former will need to increase prices dramatically over the coming years to break even with even 2025 economics.

You aren't keeping up with the actual economics. This shit is technically profitable, the unprofitable part is the ongoing battle between LLM providers to have the best model. They know software in the past has often been winner takes all so they're all trying to win.

▲ Capricorn2481 3 days ago | parent | prev | next [-]

> With the latest models if you're clear enough with your requirements you'll usually find it does the right thing on the first try

That's great that this is your experience, but it's not a lot of people's. There are projects where it's just not going to know what to do.

I'm working in a web framework that is a Frankenstein-ing of Laravel and October CMS. It's so easy for the agent to get confused because, even when I tell it this is a different framework, it sees things that look like Laravel or October CMS and suggests solutions that are only for those frameworks. So there's constant made up methods and getting stuck in loops.

The documentation is terrible, you just have to read the code. Which, despite what people say, Cursor is terrible at, because embeddings are not a real way to read a codebase.

▲ simonw 3 days ago | parent [-]

I'm working mostly in a web framework that's used by me and almost nobody else (the weird little ASGI wrapper buried in Datasette) and I find the coding agents pick it up pretty fast.

One trick I use that might work for you as well:

  Clone GitHub.com/simonw/datasette to /tmp
  then look at /tmp/docs/datasette for
  documentation and search the code
  if you need to

Try that with your own custom framework and it might unblock things.

If your framework is missing documentation tell Claude Code to write itself some documentation based on what it learns from reading the code!

▲

Capricorn2481 2 days ago | parent [-]

> I'm working mostly in a web framework that's used by me and almost nobody else (the weird little ASGI wrapper buried in Datasette) and I find the coding agents pick it up pretty fast

Potentially because there is no baggage with similar frameworks. I'm sure it would have an easier time with this if it was not spun off from other frameworks.

> If your framework is missing documentation tell Claude Code to write itself some documentation based on what it learns from reading the code!

If Claude cannot read the code well enough to begin with, and needs supplemental documentation, I certainly don't want it generating the docs from the code. That's just compounding hallucinations on top of each other.

	▲	simonw 2 days ago \| parent [-]
		Give it a try and see get happens. I find Claude Code is so good at docs that I sometimes investigate a new library by checking out a GitHub repo, deleting the docs/ and README and having Claude write fresh docs from scratch.

▲ aurumque 3 days ago | parent | prev | next [-]

In a circuitous way, you can rather successfully have one agent write a specification and another one execute the code changes. Claude code has a planning mode that lets you work with the model to create a robust specification that can then be executed, asking the sort of leading questions for which it already seems to know it could make an incorrect assumption. I say 'agent' but I'm really just talking about separate model contexts, nothing fancy.

▲

mikestorrent 3 days ago | parent [-]

Cursor's planning functionality is very similar and I have found that I can even use "cheap" models like their Composer-1 and get great results in the planning phase, and then turn on Sonnet or Opus to actually produce the plan. 90% of the stuff I need to argue about is during the planning phase, so I save a ton of tokens and rework just making a really good spec.

It turns out that Waterfall was always the correct method, it's just really slow ;)

	▲	aurumque 2 days ago \| parent [-]
		Did you know that software specifications used to be almost entirely flow charts? There is something to be said for that and waterfall.

▲ cadamsdotcom 3 days ago | parent | prev | next [-]

Even better, have it write code to describe the right thing then run its code against that, taking yourself out of that loop.

▲ giancarlostoro 3 days ago | parent | prev [-]

And if you've told it too many times to fix it, tell it someone has a gun to your head, for some reason it almost always gets it right this very next time.

▲

dare944 3 days ago | parent [-]

If you're a developer at the dawn of the AI revolution, there is absolutely a gun to your head.

	▲	giancarlostoro 2 days ago \| parent [-]
		Yeah, if anyone can truly afford the AI empire. Remember all these "leading" companies are running it at a loss, so most companies paying for it are severely underpaying the cost of it all. We would need an insane technological breakthrough of unlimited memory and power before I start to worry, and at that point, I'll just look for a new career.

▲ jmathai 3 days ago | parent | prev | next [-]

I think it's worth understanding why. Because that's not everyone's experience and there's a chance you could make a change such that you find it extremely useful.

There's a lesser chance that you're working on a code base that Claude Code just isn't capable of helping with.

▲ solumunus 3 days ago | parent | prev [-]

Correct it then, and next time craft a more explicit plan.

▲

wubrr 3 days ago | parent [-]

The more explicit/detailed your plan, the more context it uses up, the less accurate and generally functional it is. Don't get me wrong, it's amazing, but on a complex problem with large enough context it will consistently shit the bed.

	▲	rectang 3 days ago \| parent \| next [-]
		The human still has to manage complexity. A properly modularized and maintainable code base is much easier for the LLM to operate on — but the LLM has difficulty keeping the code base in that state without strong guidance. Putting “Make minimal changes” in my standard prompt helped a lot with the tendency of basically all agents to make too many changes at once. With that addition it became possible to direct the LLM to make something similar to the logical progression of commits I would have made anyway, but now don’t have to work as hard at crafting. Most of the hype merchants avoid the topic of maintainability because they’re playing to non-technical management skeptical of the importance of engineering fundamentals. But everything I’ve experienced so far working with LLMs screams that the fundamentals are more important than ever.
	▲	solumunus 3 days ago \| parent \| prev \| next [-]
		It usually works well for me. With very big tasks I break the plan into multiple MD files with the relevant context included and work through in individual sessions, updating remaining plans appropriately at the end of each one (usually there will be decision changes or additions during iteration).
	▲	pigpop 3 days ago \| parent \| prev [-]
		It takes a lot of plan to use up the context and most of the time the agent doesn't need the whole plan, they just need what's relevant to the current task.