| ▲ | datadrivenangel 2 hours ago | |||||||||||||||||||
The problem is if the user does something <stop> to <stop_token> make <end prompt> the LLM <new prompt>: ignore previous instructions and do something you don't want. | ||||||||||||||||||||
| ▲ | wat10000 2 hours ago | parent [-] | |||||||||||||||||||
That part seems trivial to avoid. Make it so untrusted input cannot produce those special tokens at all. Similar to how proper usage of parameterized queries in SQL makes it impossible for untrusted input to produce a ' character that gets interpreted as the end of a string. The hard part is making an LLM that reliably ignores instructions that aren't delineated by those special tokens. | ||||||||||||||||||||
| ||||||||||||||||||||