Remix clone Hacker News

new | show | ask | jobs Github

	▲	andy99 an hour ago
		Anthropic injects text into the conversation triggered by certain conversation topics. This happened to me in relation to some red-teaming related discussion that was adjacent to something “sensitive”, I think sex, and Claude got confused about why I had said some kind of warning and mentioned it it’s response. After a back and forth it was clear that some extra warning to answer but avoid anything inappropriate had been inserted into the conversation.
	▲	wongarsu 43 minutes ago \| parent [-]
		Claude also sometimes mentions getting messages from classifiers, probably related to auto mode. Amusingly enough, when this happens to a subagent/fork, the orchestrator will call these " hallucinations by the subagent"