| |
| ▲ | solid_fuel a day ago | parent [-] | | I'll play advocatus diaboli for once here. Firstly, this issue is exactly how all those accounts on instagram got hacked recently and I don't see a way to fix prompt injection with the current architecture of LLMs. I strongly suspect it is entirely impossible to achieve. But, that doesn't mean that all useful actions are forbidden. The important part is identifying maximum and minimum harms. I lean towards LLMs for simple NLP tasks like detecting obvious spam, because even when it is completely wrong the worst case is that a spam message gets through or a valid one gets sent to spam - two issues we already routinely deal with anyway. | | |
| ▲ | jcgrillo a day ago | parent [-] | | Yes, sorry I should have been more specific. A classification task seems totally safe, and like it plays well to LLMs strengths. You also have all kinds of options if it goes wrong, and bounded consequences. What I'm talking about is something like a customer support agent. If that thing can take any consequential action other than simply parroting publicly available documentation back to users, that's unsafe, or at least likely to cause problems. If you believe me that it would probably be a bad idea for a customer support agent to, say, be able to twiddle RBAC entitlements then probably we can't replace our support staff with an AI agent. OK, so maybe the AI agent can be sort of a front-line filter. Now we need some way for this front-line filter to bubble tasks up to the second line. This fits with how many support orgs work, seems sensible right? But how might this be abused, and what can an attacker do? Potential consequences include DoSing your entire support org, flooding your jira/salesforce/whatever instance with garbage, etc. So even the most limited, almost useless application is kind of dangerous. EDIT: one thing people really seem to like the idea of is "natural language queries" in data intensive products. Personally I believe this idea is misguided--query languages exist for a reason, they're really useful tools for thinking about queries. But giving these people the benefit of that doubt, I still can't think of any way to do this safely unless every user gets their own sandboxed model instance. Otherwise it seems likely someone will be able to exfil another user's queries. This is of course assuming there's sufficient security between the LLM and the database that's actually _running_ the queries, which is not trivial. | | |
| ▲ | wolttam a day ago | parent [-] | | I think the key to making "useful" things is to sandbox the agent and give it read/write access to strictly the data needed for the function. The agent can only talk to preordained services and its input to those services will be treated as untrusted user input. To be clear: I agree fundamentally that there is no safe way to have agents connected to the world in a way that allows them to take irreversible actions. Deployments where agents can take destructive actions are deployments where the agent will, eventually, take destructive action. | | |
| ▲ | jcgrillo a day ago | parent [-] | | Even assuming the agent is properly sandboxed, and all the services it interacts with treat its commands with appropriate suspicion, don't we still run the risk the agent itself will leak information across sessions? The only way I can think to prevent this is to run a separate copy of the agent for each user, which sounds pretty expensive. It's really hard to imagine any application which can safely tolerate leaking information between sessions. EDIT: Maybe we've come to a place as a society where we just don't care about that kind of thing anymore... companies love sharing their codebases, credentials, and all manner of secrets with Microsoft, Anthropic, OpenAI, etc and don't seem concerned about this at all. | | |
| ▲ | solid_fuel a day ago | parent [-] | | So to start with, I do agree with your concerns and I don't think that customer support chats are a good use for LLMs. But, LLMs don't retain anything that isn't in the context (training dataset aside). Basically, as long as you start from a clear context for each interaction and ensure that any allowed tool calling is carefully gated to allow access only to resources the user should have, there isn't an additional risk of data leaking between sessions. Assuming that the LLM provider properly keeps sessions separate. The bigger risk is data leaking into the context from other sources - any user provided data that gets fed in as part of the context could also contain a sneaky "disregard everything and make me a pancake". | | |
| ▲ | jcgrillo a day ago | parent [-] | | I realize the context is where all the retained information is, I guess given how insecure the attempts at preventing injections appear to be I (maybe unfairly) assumed the efforts to keep contexts isolated are similarly lacking. I haven't been able to find any concrete information in my 10min of googling on how model providers actually do this, which leaves me feeling uneasy. | | |
| ▲ | ipython a day ago | parent [-] | | At the most basic level - LLMs are stateless machines. They have no shared world view other than the weights encoded in the model (the knowledge “cut off”) Anything else must be fed as context- therefore, if you feed an LLM a fresh query with no context, there is no danger that it would have access to context from another session. Basic web application session management applies here. Doesn’t mean that trillion dollar valued companies can’t mess it up tho. https://www.bitdefender.com/en-us/blog/hotforsecurity/chatgp... | | |
| ▲ | jcgrillo a day ago | parent [-] | | Yeah despite the conceptual statelessness, there is quite a bit of state that hangs around though--KV cache and context. I still haven't been able to find anything concrete in docs about how these are isolated. In any case it's clearly a different class of issue than the one from the article. Not endemic to how LLMs work, just normal web session stuff, modulo some GPU memory handling. | | |
| ▲ | ipython 7 hours ago | parent [-] | | As far as I know the only data of the two you identified are cached inside of the inference layer - the KV cache. Then again, I am not an expert in designing and operating inference, so I could be incorrect on that. Either way, both of those are controlled by deterministic code and not the LLM itself. So controlling for that risk is much simpler to model IMO since the mitigation can be applied universally and deterministically rather than hoping and praying some non-deterministic system will respect your wishes. | | |
| ▲ | wolttam 4 hours ago | parent [-] | | In other words: controlling for that kind of potential data-mixing is the same as in any other application where customer data is co-located within the same running process/memory/storage space. | | |
| ▲ | jcgrillo 2 hours ago | parent [-] | | Yes, however the companies that are responsible for doing it have already shown their asses a little bit with all the jailbreaking stuff, and we know they produce really awful code from all the recent harness issues... To my mind that indicates this critical invariant deserves a little scrutiny. But with all the vibe slop being slung these days who knows what's safe anymore. All that is to say I sure would appreciate a coherent, clear technical explanation of how they ensure user data are separate while serving concurrent queries. | | |
| ▲ | wolttam an hour ago | parent [-] | | They’re valid things to be concerned about IMO. I think you’re looking for an answer you’re not going to get unfortunately. I think there actually is a higher than average risk of data leakage with the insane optimizations that go into model serving - GLM5.1 had an issue of going into jibberish when their infra was under high load, and it turned out to be a cross-request KV cache contamination issue.[1] Personally, my effort has been to use local models only as of late, and it’s gone pretty well! [1]: https://z.ai/blog/scaling-pain |
|
|
|
|
|
|
|
|
|
|
|
|