| ▲ | simonw 4 hours ago | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I'm not entirely convinced by the anecdote here where Claude wrote "bad" React code: > But in context, this was obviously insane. I knew that key and id came from the same upstream source. So the correct solution was to have the upstream source also pass id to the code that had key, to let it do a fast lookup. I've seen Claude make mistakes like that too, but then the moment you say "you can modify the calling code as well" or even ask "any way we could do this better?" it suggests the optimal solution. My guess is that Claude is trained to bias towards making minimal edits to solve problems. This is a desirable property, because six months ago a common complaint about LLMs is that you'd ask for a small change and they would rewrite dozens of additional lines of code. I expect that adding a CLAUDE.md rule saying "always look for more efficient implementations that might involve larger changes and propose those to the user for their confirmation if appropriate" might solve the author's complaint here. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | bblcla 4 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
(Author here) > I'm not entirely convinced by the anecdote here where Claude wrote "bad" React code Yeah, that's fair - a friend of mine also called this out on Twitter (https://x.com/konstiwohlwend/status/2010799158261936281) and I went into more technical detail about the specific problem there. > I've seen Claude make mistakes like that too, but then the moment you say "you can modify the calling code as well" or even ask "any way we could do this better?" it suggests the optimal solution. I agree, but I think I'm less optimistic than you that Claude will be able to catch its own mistakes in the future. On the other hand, I can definitely see how a ~more intelligent model might be able to catch mistakes on a larger and larger scale. > I expect that adding a CLAUDE.md rule saying "always look for more efficient implementations that might involve larger changes and propose those to the user for their confirmation if appropriate" might solve the author's complaint here. I'm not sure about this! There are a few things Claude does that seem unfixable even by updating CLAUDE.md. Some other footguns I keep seeing in Python and constantly have to fix despite CLAUDE.md instructions are: - writing lots of nested if clauses instead of writing simple functions by returning early - putting imports in functions instead of at the top-level - swallowing exceptions instead of raising (constantly a huge problem) These are small, but I think it's informative of what the models can do that even Opus 4.5 still fails at these simple tasks. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Kuinox 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> My guess is that Claude is trained to bias towards making minimal edits to solve problems. I don't have the same feeling. I find that claude tends to produce wayyyyy too much code to solve a problem, compared to other LLMs. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | joshribakoff 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I expect that adding instructions that attempt to undo training produces worse results than not including the overbroad generalization in the training in the first place. I think the author isn’t making a complaint they’re documenting a tradeoff. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | threethirtytwo 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Definitely, The training parameters encourage this. The AI is actually deliberately also trying to trick you and we know for that for a fact. Problems with solutions too complicated to explain or to output in one sitting are out of the question. The AI will still bias towards one shot solutions if given one of these problems because all the training is biased towards a short solution. It's not really practical to give it training data with multi step ultra complicated solutions. Think about it. The thousands of questions given to it for reinforcement.... the trainer is going to be trying to knock those out as efficiently as possible so they have to be readable problems with shorter readable solutions. So we know AI biases towards shorter readable solutions. Second, Any solution that tricks the reader will pass training. There is for sure a subset of questions/solution pairs that meet this criteria by definition because WE as trainers simply are unaware we are being tricked. So this data leaks into the training and as a result AI will bias towards deception as well. So all in all it is trained to trick you and give you the best solution that can fit into a context that is readable in one sitting. In theory we can get it to do what we want only if we had perfect reinforcement data. The reliability we're looking for seems to be just right over this hump. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | AIorNot 4 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Well yes but the wider point is that it takes new Human skills to manage them - like a pair of horses so to speak under your bridle When it comes down to it these AI tools are like going to power tools or machines from the artisanal era - like going from surgical knife to a machine gun- so they operate at a faster pace without comprehending like humans - and without allowing humans time to comprehend all side effects and massive assumptions they make on every run in their context window humans have to adapt to managing them correctly and at the right scale to be effective and that becomes something you learn | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||