When you use LLMs with APIs I at least see the history as a json list of entries, each being tagged as coming from the user, the LLM or being a system prompt.

So presumably (if we assume there isn't a bug where the sources are ignored in the cli app) then the problem is that encoding this state for the LLM isn' reliable. I.e. it get's what is effectively

LLM said: thing A User said: thing B

And it still manages to blur that somehow?

▲

jasongi 4 hours ago | parent [-]

Someone correct me if I'm wrong, but an LLM does not interpret structured content like JSON. Everything is fed into the machine as tokens, even JSON. So your structure that says "human says foo" and "computer says bar" is not deterministically interpreted by the LLM as logical statements but as a sequence of tokens. And when the context contains a LOT of those sequences, especially further "back" in the window then that is where this "confusion" occurs.

I don't think the problem here is about a bug in Claude Code. It's an inherit property of LLMs that context further back in the window has less impact on future tokens.

Like all the other undesirable aspects of LLMs, maybe this gets "fixed" in CC by trying to get the LLM to RAG their own conversation history instead of relying on it recalling who said what from context. But you can never "fix" LLMs being a next token generator... because that is what they are.

	▲	afc 4 hours ago \| parent \| next [-]
		That's exactly my understanding as well. This is, essentially, the LLM hallucinating user messages nested inside its outputs. FWIWI I've seen Gemini do this frequently (especially on long agent loops).
	▲	coffeefirst 4 hours ago \| parent \| prev [-]
		I think that’s correct. There seems to be a lot of fundamental limitations that have been “fixed” through a boatload of reinforcement learning. But that doesn’t make them go away, it just makes them less glaring.