| ▲ | xg15 6 hours ago | |||||||
> This class of bug seems to be in the harness, not in the model itself. It’s somehow labelling internal reasoning messages as coming from the user, which is why the model is so confident that “No, you said that.” Are we sure about this? Accidentally mis-routing a message is one thing, but those messages also distinctly "sound" like user messages, and not something you'd read in a reasoning trace. I'd like to know if those messages were emitted inside "thought" blocks, or if the model might actually have emitted the formatting tokens that indicate a user message. (In which case the harness bug would be why the model is allowed to emit tokens in the first place that it should only receive as inputs - but I think the larger issue would be why it does that at all) | ||||||||
| ▲ | loveparade 5 hours ago | parent | next [-] | |||||||
Yeah, it looks like a model issue to me. If the harness had a (semi-)deterministic bug and the model was robust to such mix-ups we'd see this behavior much more frequently. It looks like the model just starts getting confused depending on what's in the context, speakers are just tokens after all and handled in the same probabilistic way as all other tokens. | ||||||||
| ||||||||
| ▲ | qeternity 3 hours ago | parent | prev | next [-] | |||||||
> or if the model might actually have emitted the formatting tokens that indicate a user message. These tokens are almost universally used as stop tokens which causes generation to stop and return control to the user. If you didn't do this, the model would happily continue generating user + assistant pairs w/o any human input. | ||||||||
| ▲ | yanis_t 4 hours ago | parent | prev | next [-] | |||||||
Also could be a bit both, with harness constructing context in a way that model misinterprets it. | ||||||||
| ▲ | sixhobbits 5 hours ago | parent | prev [-] | |||||||
author here - yeah maybe 'reasoning' is the incorrect term here, I just mean the dialogue that claude generates for itself between turns before producing the output that it gives back to the user | ||||||||
| ||||||||