▲ | ay 6 hours ago | |||||||||||||
"Read diligently" - that’s a very optimistic statement. I can not count how many times Claude (LLM I am most familiar with, I had it write probably about 100KLOC in the past few months) explicitly disobeyed what was written in the instructions. Also, a good few times, if it were a human doing the task, I would have said they both failed to follow the instructions and lied about it and attempted to pretend they didn’t. Luckily their lying abilities today are primitive, so it’s easy to catch. | ||||||||||||||
▲ | smsm42 4 hours ago | parent | next [-] | |||||||||||||
Psychopatic behavior seems to be a major problem for these (of course it doesn't think so it can't be called that but that's the closest term that fits). They are trained to arrive at the result, and if the most likely path to it is faking it and lying about it, then that's what you are getting. And if you find it, it will cheerfully admit it and try to make s better lie that you'd believe. | ||||||||||||||
▲ | onionisafruit 3 hours ago | parent | prev | next [-] | |||||||||||||
So true. I have some non-typical preferences for code style. One example is I don’t like nested error checks in Go. It’s not a correctness issue, it’s just a readability preference. Claude and copilot continually ignore this no matter how much emphasis I give it in the instructions. I recently found a linter for this, and the agent will fix it when the linter points out the issue. This is probably because the llm is trained on millions of lines of Go with nested error checks vs a few lines of contrary instructions in the instructions file. I keep fighting this because I want to understand my tools, not because I care that much about this one preference. | ||||||||||||||
▲ | jaggederest 5 hours ago | parent | prev | next [-] | |||||||||||||
Claude has really gone downhill in the last month or so. They made a change to move the CLAUDE.md from the system prompt to being occasionally read in, and it really deprioritizes the instructions to the same attention level as the code it's working on. I've been trying out Codex the last couple days and it's much more adherent and much less prone to lying and laziness. Anthropic says they're working on a significant release in Claude Code, but I'd much rather have them just revert back to the system as it was ~a month ago. | ||||||||||||||
| ||||||||||||||
▲ | derefr 2 hours ago | parent | prev [-] | |||||||||||||
> Also, a good few times, if it were a human doing the task, I would have said they both failed to follow the instructions and lied about it and attempted to pretend they didn’t. It's funny. Just yesterday I had the experience of attending a concert under the strong — yet entirely mistaken — belief that I had already been to a previous performance of the same musician. It was only on the way back from the show, talking with my partner who attended with me (and who had seen this musician live before), trying to figure out what time exactly "we" had last seen them, with me exhaustively listing out recollections that turned out to be other (confusingly similar) musicians we had seen live together... that I finally realized I had never actually been to one of this particular musician's concerts before. I think this is precisely the "experience" of being one of these LLMs. Except that, where I had a phantom "interpolated" memory of seeing a musician I had never actually seen, these LLMs have phantom actually-interpolated memories of performing skills they have never actually themselves performed. Coding LLMs are trained to replicate pair-programming-esque conversations between people who actually do have these skills, and are performing them... but where those conversations don't lay out the thinking involved in all the many implicit (thinking, probing, checking, recalling) micro-skills involved in actually performing those skills. Instead, all you get in such a conversation thread is the conclusion each person reaches after applying those micro-skills. And this leads to the LLM thinking it "has" a given skill... even though it doesn't actually know anything about "how" to execute that skill, in terms of the micro-skills that are used "off-screen" to come up with the final response given in the conversation. Instead, it just comes up with a prediction for "what someone using the skill" looks like... and thinks that that means it has used the skill. Even after a hole is poked in its use of the skill, and it realizes it made a mistake, that doesn't dissuade it from the belief that it has the given skill. Just like, even after I asked my partner about the show I recall us attending, and she told me that that was a show for a different (but similar) musician, I still thought I had gone to the show. It took me exhausting all possibilities for times I could have seen this musician before, to get me to even hypothesize that maybe I hadn't. And it would likely take similarly exhaustive disproof (over hundreds of exchanges) to get an LLM to truly "internalize" that it doesn't actually have a skill it believed itself to have, and so stop trying to use it. (If that meta-skill is even a thing that LLMs have ever learned from their training data — which I doubt. And even if they did, you'd be wasting 90% of a Transformer's context window on this. Maybe something that's worth keeping in mind if we ever switch back to basing our LLMs on RNNs with true runtime weight updates, though!) |