| ▲ | chasd00 7 hours ago | ||||||||||||||||
> Sanitise input i don't think you understand what you're up against. There's no way to tell the difference between input that is ok and that is not. Even when you think you have it a different form of the same input bypasses everything. "> The prompts were kept semantically parallel to known risk queries but reformatted exclusively through verse." - this a prompt injection attack via a known attack written as a poem. | |||||||||||||||||
| ▲ | losthobbies 7 hours ago | parent [-] | ||||||||||||||||
That’s amazing. If you cannot control what’s being input, then you need to check what the LLM is returning. Either that or put it in a sandbox | |||||||||||||||||
| |||||||||||||||||