(Author here)

> I'm not entirely convinced by the anecdote here where Claude wrote "bad" React code

Yeah, that's fair - a friend of mine also called this out on Twitter (https://x.com/konstiwohlwend/status/2010799158261936281) and I went into more technical detail about the specific problem there.

> I've seen Claude make mistakes like that too, but then the moment you say "you can modify the calling code as well" or even ask "any way we could do this better?" it suggests the optimal solution.

I agree, but I think I'm less optimistic than you that Claude will be able to catch its own mistakes in the future. On the other hand, I can definitely see how a ~more intelligent model might be able to catch mistakes on a larger and larger scale.

> I expect that adding a CLAUDE.md rule saying "always look for more efficient implementations that might involve larger changes and propose those to the user for their confirmation if appropriate" might solve the author's complaint here.

I'm not sure about this! There are a few things Claude does that seem unfixable even by updating CLAUDE.md.

Some other footguns I keep seeing in Python and constantly have to fix despite CLAUDE.md instructions are:

- writing lots of nested if clauses instead of writing simple functions by returning early

- putting imports in functions instead of at the top-level

- swallowing exceptions instead of raising (constantly a huge problem)

These are small, but I think it's informative of what the models can do that even Opus 4.5 still fails at these simple tasks.

▲

ako 3 hours ago | parent | next [-]

> I agree, but I think I'm less optimistic than you that Claude will be able to catch its own mistakes in the future. On the other hand, I can definitely see how a ~more intelligent model might be able to catch mistakes on a larger and larger scale.

Claude already does this. Yesterday i asked it why some functionality was slow, it did some research, and then came back with all the right performance numbers, how often certain code was called, and opportunities to cache results to speed up execution. It refactored the code, ran performance tests, and reported the performance improvements.

▲

ekidd 3 hours ago | parent [-]

I have been reading through this thread, and my first reaction to many of the comments was "Skill issue."

Yes, it can build things that have never existed before. Yes, it can review its own code. Yes, it can do X, Y and Z.

Does it do all these things spontaneously with no structure? No, it doesn't. Are there tricks to getting it do some of these things? Yup. If you want code review, start by writing a code review "skill". Have that skill ask Opus to fork off several subagents to review different aspects, and then synthesize the reports, with issues broken down by Critical, Major and Minor. Have the skill describe all the things you want from a review.

There are, as the OP pointed out, a lot of reasons why you can't run it with no human at all. But with an experienced human nudging it? It can do a lot.

	▲	ako 2 hours ago \| parent [-]
		It's basically not very different from working with an average development team as a product owner/manager: you need to feed it specific requirements or it will hallucinate some requirements, bugs are expected, even with unit test and testers on the team. And yes, as a product owner you also make mistakes, never have all the requirements up front, but the nice thing working with a GenAI coder is that you can iterate over these requirement gaps, hallucinated requirements and bugs in minutes, not in days.

▲

chapel 3 hours ago | parent | prev | next [-]

Those Python issues are things I had to deal with earlier last year with Claude Sonnet 3.7, 4.0, and to a lesser extent Opus 4.0 when it was available in Claude Code.

In the Python projects I've been using Opus 4.5 with, it hasn't been showing those issues as often, but then again the projects are throwaway and I cared more about the output than the code itself.

The nice thing about these agentic tools is that if you setup feedback loops for them, they tend to fix issues that are brought up. So much of what you bring up can be caught by linting.

The biggest unlock for me with these tools is not letting the context get bloated, not using compaction, and focusing on small chunks of work and clearing the context before working on something else.

	▲	bblcla 3 hours ago \| parent [-]
		Arguably linting is a kind of abstraction block!

▲

pluralmonad 3 hours ago | parent | prev | next [-]

I wonder if this is specific to Python. I've had no trouble like that with Claude generating Elixir. Claude sticks to the existing styles and paradigms quite well. Can see in the thinking traces that Claude takes this into consideration.

▲

doug_durham 3 hours ago | parent | prev [-]

That's where you come in as an experienced developer. You point out the issues and iterate. That's the normal flow of working with these tools.

	▲	bblcla 3 hours ago \| parent [-]
		I agree! Like I said at the end of the tool, I think Claude is a great tool. In this piece, I'm arguing against the 'AGI' believers who think it's going to replace all developers.