▲ | brap 4 days ago | |
The problem is that once you load a tool’s response into context, there’s no telling what the LLM will do. You can escape it all you want, but maybe it contains the right magic words you haven’t thought of. The solution is to not load it into context at all. I’ve seen a proposal for something like this but I can’t find it (I think from Google?). The idea is (if I remember it correctly) to spawn another dedicated (and isolated) LLM that would be in charge of the specific response. The main LLM would ask it questions and the answers would be returned as variables that it may then pass around (but it can’t see the content of those variables). Edit: found it. https://arxiv.org/abs/2503.18813 Then there’s another problem: how do you make sure the LLM doesn’t leak anything sensitive via its tools (not just the payload, but the commands themselves can encode information)? I think it’s less of a threat if you solve the first problem, but still… I didn’t see a practical solution for this yet. | ||
▲ | eranation 3 days ago | parent [-] | |
Thanks for the link to the article, very interesting! |