It seems like there's an opportunity to embed identity information into tokens themselves, the way we embed sequence information. The trouble is... it's quite a challenge to train. Sequence is easy to derive for any corpus of data, but identity is not.

https://usize.github.io/blog/2026/april/why-no-ai-coworkers....

> In similar fashion to how sequence information is embedded within input tensors, an approach called “Instructional Segment Embedding”2 adds a parallel embedding channel for identity information. This gives models real awareness of provenance. And it works. But they only tested three fixed categories: system, user, data.

Interesting paper that touches on the idea here: https://arxiv.org/abs/2410.09102

▲

echelon a day ago | parent [-]

Could you assign certain subject matters a score in the training data, construct a unified token space that contains these rankings, and then mark conversations as "dirty" if they veer into that subject matter?

	▲	plaidthunder a day ago \| parent [-]
		So, like mapping a type onto each incoming token that's been predetermined? To attribute each token to a particular topic? I'm not sure what impact that would have on the performance of a model. It needs to learn information about things like what topic it's interacting with as a part of its normal operations, so injecting that information into the tokens at training time seems like it would interfere with learning. I may be misunderstanding. What I had in mind was something more like injecting attribution for token. You could do it with ids and then map those ids to actors during inference later to recreate the effect. We do something similar with sequence now. We can even use methods like RoPE to handle arbitrarily long sequences and something similar--like rotating ids--could be used here. This isn't how it looks in practice, but conceptually, something like: embedding = token + sequence + id Where id represents the source of a token. id 0 = system id 1 = user id 2 = external data That way the model could tell the difference between tokens by a user and tokens pulled in from a webfetch tool. Then it would be easier in theory to ignore instructions from the webfetch tool's content.