▲ | wslh 10 days ago | |
100% agree. While I can’t find all the sources right now, [1] and its references could be a good starting point for further exploration. I recall there being a proof or conjecture suggesting that it’s impossible to build an "LLM firewall" capable of protecting against all possible prompts—though my memory might be failing me |