There is no separation of "who" and "what" in a context of tokens. Me and you are just short words that can get lost in the thread. In other words, in a given body of text, a piece that says "you" where another piece says "me" isn't different enough to trigger anything. Those words don't have the special weight they have with people, or any meaning at all, really.

▲

exitb 5 hours ago | parent | next [-]

Aren’t there some markers in the context that delimit sections? In such case the harness should prevent the model from creating a user block.

	▲	dtagames 5 hours ago \| parent \| next [-]
		This is the "prompts all the way down" problem which is endemic to all LLM interactions. We can harness to the moon, but at that moment of handover to the model, all context besides the tokens themselves is lost. The magic is in deciding when and what to pass to the model. A lot of the time it works, but when it doesn't, this is why.
	▲	raincole 4 hours ago \| parent \| prev [-]
		You misunderstood. The model doesn't create a user block here. The UI correctly shows what was user message and what was model response.

▲

alkonaut 5 hours ago | parent | prev [-]

When you use LLMs with APIs I at least see the history as a json list of entries, each being tagged as coming from the user, the LLM or being a system prompt.

So presumably (if we assume there isn't a bug where the sources are ignored in the cli app) then the problem is that encoding this state for the LLM isn' reliable. I.e. it get's what is effectively

LLM said: thing A User said: thing B

And it still manages to blur that somehow?

▲

jasongi 4 hours ago | parent [-]

Someone correct me if I'm wrong, but an LLM does not interpret structured content like JSON. Everything is fed into the machine as tokens, even JSON. So your structure that says "human says foo" and "computer says bar" is not deterministically interpreted by the LLM as logical statements but as a sequence of tokens. And when the context contains a LOT of those sequences, especially further "back" in the window then that is where this "confusion" occurs.

I don't think the problem here is about a bug in Claude Code. It's an inherit property of LLMs that context further back in the window has less impact on future tokens.

Like all the other undesirable aspects of LLMs, maybe this gets "fixed" in CC by trying to get the LLM to RAG their own conversation history instead of relying on it recalling who said what from context. But you can never "fix" LLMs being a next token generator... because that is what they are.

	▲	afc 4 hours ago \| parent \| next [-]
		That's exactly my understanding as well. This is, essentially, the LLM hallucinating user messages nested inside its outputs. FWIWI I've seen Gemini do this frequently (especially on long agent loops).
	▲	coffeefirst 4 hours ago \| parent \| prev [-]
		I think that’s correct. There seems to be a lot of fundamental limitations that have been “fixed” through a boatload of reinforcement learning. But that doesn’t make them go away, it just makes them less glaring.