Remix.run Logo
xg15 5 hours ago

That's something I'm wondering as well. Not sure how it is with frontier models, but what you can see on Huggingface, the "standard" method to distinguish tokens still seems to be special delimiter tokens or even just formatting.

Are there technical reasons why you can't make the "source" of the token (system prompt, user prompt, model thinking output, model response output, tool call, tool result, etc) a part of the feature vector - or even treat it as a different "modality"?

Or is this already being done in larger models?

jerf 2 hours ago | parent [-]

By the nature of the LLM architecture I think if you "colored" the input via tokens the model would about 85% "unlearn" the coloring anyhow. Which is to say, it's going to figure out that "test" in the two different colors is the same thing. It kind of has to, after all, you don't want to be talking about a "test" in your prompt and it be completely unable to connect that to the concept of "test" in its own replies. The coloring would end up as just another language in an already multi-language model. It might slightly help but I doubt it would be a solution to the problem. And possibly at an unacceptable loss of capability as it would burn some of its capacity on that "unlearning".