Isn't the problem that role tags are just part of the input stream? So a specific word in the system prompt becomes the same token as the same word in the user prompt? A clean way to solve this would be to map system prompts to a distinct set of tokens from the ones in user prompts. This would require twice as many possible tokens, so it is probably not feasible. But maybe you could add "color" to the input stream by changing one input variable depending on whether the current token is part of the system prompt or not? Just like humans take different voices into account and not just the context of the text.

I have to say I am not very familiar with implementation details of language models, and maybe this is already done?

▲

lambdaone 16 hours ago | parent [-]

Instead of having distinct tokens, you could have modifier vectors which would be added to other tokens. Think in terms of control, shift, meta etc.

	▲	certainforest 9 hours ago \| parent [-]
		Hey, Jasmine here (one of the authors) -- that's an interesting idea! There's an interesting exploration of this here: https://www.lesswrong.com/posts/HEzNZ9gvgYwT3aZFS/role-embed.... Curious if you have additional thoughts, and thanks for reading!