Remix.run Logo
LoganDark 2 hours ago

Someone needs to train a model where untrusted input uses a completely different set of tokens so that it's entirely impossible for the model to confuse them with instructions. I've never even seen that approach mentioned let alone implemented.

jorl17 an hour ago | parent [-]

Perhaps this is in line with what you had in mind? https://patents.google.com/patent/US12118471

LoganDark 5 minutes ago | parent [-]

> The input is represented as tokens, wherein the trusted instructions and the untrusted instructions are represented using incompatible token sets.

Yes, exactly!