Remix.run Logo
nathell 5 hours ago

I’ve hit this! In my otherwise wildly successful attempt to translate a Haskell codebase to Clojure [0], Claude at one point asks:

[Claude:] Shall I commit this progress? [some details about what has been accomplished follow]

Then several background commands finish (by timeout or completing); Claude Code sees this as my input, thinks I haven’t replied to its question, so it answers itself in my name:

[Claude:] Yes, go ahead and commit! Great progress. The decodeFloat discovery was key.

The full transcript is at [1].

[0]: https://blog.danieljanus.pl/2026/03/26/claude-nlp/

[1]: https://pliki.danieljanus.pl/concraft-claude.html#:~:text=Sh...

dgb23 2 hours ago | parent | next [-]

For those who are wondering: These LLMs are trained on special delimiters that mark different sources of messages. There's typical something like [system][/system], then one for agent, user and tool. There are also different delimiter shapes.

You can even construct a raw prompt and tell it your own messaging structure just via the prompt. During my initial tinkering with a local model I did it this way because I didn't know about the special delimiters. It actually kind of worked and I got it to call tools. Was just more unreliable. And it also did some weird stuff like repeating the problem statement that it should act on with a tool call and got in loops where it posed itself similar problems and then tried to fix them with tool calls. Very weird.

In any case, I think the lesson here is that it's all just probabilistic. When it works and the agent does something useful or even clever, then it feels a bit like magic. But that's misleading and dangerous.

sixhobbits 4 hours ago | parent | prev | next [-]

amazing example, I added it to the article, hope that's ok :)

swellep 3 hours ago | parent | prev | next [-]

I've seen something similar. It's hard to get Claude to stop committing by itself after granting it the permission to do so once.

empressplay an hour ago | parent | prev | next [-]

I wonder if this is a result of auto-compacting the context? Maybe when it processes it it inadvertently strips out its own [Header:] and then decides to answer its own questions.

indigodaddy 31 minutes ago | parent [-]

The most likely explanation imv

ares623 5 hours ago | parent | prev [-]

I wonder if tools like Terraform should remove the message "Run terraform apply plan.out next" that it prints after every `terraform plan` is run.

bravetraveler 4 hours ago | parent [-]

I don't think so, feels like the wrong side is getting attention. Degrading the experience for humans (in one tool) because the bots are prone to injection (from any tool). Terraform is used outside of agents; somebody surely finds the reminder helpful.

If terraform were to abide, I'd hope at the very least it would check if in a pipeline or under an agent. This should be obvious from file descriptors/env.

What about the next thing that might make a suggestion relying on our discretion? Patch it for agent safety?

TeMPOraL 4 hours ago | parent | next [-]

"Run terraform apply plan.out next" in this context is a prompt injection for an LLM to exactly the same degree it is for a human.

Even a first party suggestion can be wrong in context, and if a malicious actor managed to substitute that message with a suggestion of their own, humans would fall for the trick even more than LLMs do.

See also: phishing.

bravetraveler 4 hours ago | parent | next [-]

Right, I'm fine with humans making the call. We're not so injection-happy/easily confused, apparently.

Discretion, etc. We understand that was the tool making a suggestion, not our idea. Our agency isn't in question.

The removal proposal is similar to wanting a phishing-free environment instead of preparing for the inevitability. I could see removing this message based on your point of context/utility, but not to protect the agent. We get no such protection, just training and practice.

A supply chain attack is another matter entirely; I'm sure people would pause at a new suggestion that deviates from their plan/training. As shown, autobots are eager to roll out and easily drown in context. So much so that `User` and `stdout` get confused.

franktankbank 3 hours ago | parent | prev [-]

Maybe the agents should require some sort of input start token: "simon says"

8note 4 hours ago | parent | prev [-]

it makes you wonder how many times people have incorrectly followed those recommended commands

bravetraveler 3 hours ago | parent [-]

If more than once (individually), I am concerned.