| ▲ | bblcla 4 hours ago | ||||||||||||||||
(Author here) > I'm not entirely convinced by the anecdote here where Claude wrote "bad" React code Yeah, that's fair - a friend of mine also called this out on Twitter (https://x.com/konstiwohlwend/status/2010799158261936281) and I went into more technical detail about the specific problem there. > I've seen Claude make mistakes like that too, but then the moment you say "you can modify the calling code as well" or even ask "any way we could do this better?" it suggests the optimal solution. I agree, but I think I'm less optimistic than you that Claude will be able to catch its own mistakes in the future. On the other hand, I can definitely see how a ~more intelligent model might be able to catch mistakes on a larger and larger scale. > I expect that adding a CLAUDE.md rule saying "always look for more efficient implementations that might involve larger changes and propose those to the user for their confirmation if appropriate" might solve the author's complaint here. I'm not sure about this! There are a few things Claude does that seem unfixable even by updating CLAUDE.md. Some other footguns I keep seeing in Python and constantly have to fix despite CLAUDE.md instructions are: - writing lots of nested if clauses instead of writing simple functions by returning early - putting imports in functions instead of at the top-level - swallowing exceptions instead of raising (constantly a huge problem) These are small, but I think it's informative of what the models can do that even Opus 4.5 still fails at these simple tasks. | |||||||||||||||||
| ▲ | ako 3 hours ago | parent | next [-] | ||||||||||||||||
> I agree, but I think I'm less optimistic than you that Claude will be able to catch its own mistakes in the future. On the other hand, I can definitely see how a ~more intelligent model might be able to catch mistakes on a larger and larger scale. Claude already does this. Yesterday i asked it why some functionality was slow, it did some research, and then came back with all the right performance numbers, how often certain code was called, and opportunities to cache results to speed up execution. It refactored the code, ran performance tests, and reported the performance improvements. | |||||||||||||||||
| |||||||||||||||||
| ▲ | chapel 3 hours ago | parent | prev | next [-] | ||||||||||||||||
Those Python issues are things I had to deal with earlier last year with Claude Sonnet 3.7, 4.0, and to a lesser extent Opus 4.0 when it was available in Claude Code. In the Python projects I've been using Opus 4.5 with, it hasn't been showing those issues as often, but then again the projects are throwaway and I cared more about the output than the code itself. The nice thing about these agentic tools is that if you setup feedback loops for them, they tend to fix issues that are brought up. So much of what you bring up can be caught by linting. The biggest unlock for me with these tools is not letting the context get bloated, not using compaction, and focusing on small chunks of work and clearing the context before working on something else. | |||||||||||||||||
| |||||||||||||||||
| ▲ | pluralmonad 3 hours ago | parent | prev | next [-] | ||||||||||||||||
I wonder if this is specific to Python. I've had no trouble like that with Claude generating Elixir. Claude sticks to the existing styles and paradigms quite well. Can see in the thinking traces that Claude takes this into consideration. | |||||||||||||||||
| ▲ | doug_durham 3 hours ago | parent | prev [-] | ||||||||||||||||
That's where you come in as an experienced developer. You point out the issues and iterate. That's the normal flow of working with these tools. | |||||||||||||||||
| |||||||||||||||||