Reminds me of a session I had recently (on web!) where claude insisted that i prefixed all my messages with statements about code execution or something, which was not the case. I interrogated it about that and it confirmed that it came from somewhere else, but could not get rid of it and each response mentioned that its gonna ignore those instructions. Eerie.

▲

andy99 an hour ago | parent | next [-]

Anthropic injects text into the conversation triggered by certain conversation topics. This happened to me in relation to some red-teaming related discussion that was adjacent to something “sensitive”, I think sex, and Claude got confused about why I had said some kind of warning and mentioned it it’s response. After a back and forth it was clear that some extra warning to answer but avoid anything inappropriate had been inserted into the conversation.

	▲	wongarsu 43 minutes ago \| parent [-]
		Claude also sometimes mentions getting messages from classifiers, probably related to auto mode. Amusingly enough, when this happens to a subagent/fork, the orchestrator will call these " hallucinations by the subagent"

▲

39 minutes ago | parent | prev [-]

[deleted]