Kinda funny that a lot of devs accepted that LLMs are basically doing RCE on their machines, but instead of halting from using `--dangerously-skip-permissions` or similar bad ideas, we're finding workarounds to convince ourselves it's not that bad

▲

staticassertion 2 hours ago | parent | next [-]

Just like every package manager already does? This issue predates LLMs and people have never cared enough to pressure dev tooling into caring. LLMs have seemingly created a world where people are finally trying to solve the long existing "oh shit there's code execution everywhere in my dev environment where I have insane levels of access to prod etc" problem.

▲

simonw 16 hours ago | parent | prev | next [-]

Because we've judged it to be worth it!

YOLO mode is so much more useful that it feels like using a different product.

If you understand the risks and how to limit the secrets and files available to the agent - API keys only to dedicated staging environments for example - they can be safe enough.

▲

zahlman 16 hours ago | parent | next [-]

Why not just demand agents that don't expose the dangerous tools in the first place? Like, have them directly provide functionality (and clearly consider what's secure, sanitize any paths in the tool use request, etc.) instead of punting to Bash?

▲

TeMPOraL 15 hours ago | parent | next [-]

Because it's impossible for fundamental reasons, period. You can't "sanitize" inputs and outputs of a fully general-purpose tool, which an LLM is, any more than you can "sanitize" inputs and outputs of people - not in a perfect sense you seem to be expecting here. There is no grammar you can restrict LLMs to; for a system like this, the semantics are total and open-ended. It's what makes them work.

It doesn't mean we can't try, but one has to understand the nature of the problem. Prompt injection isn't like SQL injection, it's like a phishing attack - you can largely defend against it, but never fully, and at some point the costs of extra protection outweigh the gain.

▲

zahlman 15 hours ago | parent [-]

> There is no grammar you can restrict LLMs to; for a system like this, the semantics are total and open-ended. It's what makes them work.

You're missing the point.

An agent system consists of an LLM plus separate "agentive" software that can a) receive your input and forward it to the LLM; b) receive text output by the LLM in response to your prompt; c) ... do other stuff, all in a loop. The actual model can only ever output text.

No matter what text the LLM outputs, it is the agent program that actually runs commands. The program is responsible for taking the output and interpreting it as a request to "use a tool" (typically, as I understand it, by noticing that the LLM's output is JSON following a schema, and extracting command arguments etc. from it).

Prompt injection is a technique for getting the LLM to output text that is dangerous when interpreted by the agent system, for example, "tool use requests" that propose to run a malicious Bash command.

You can clearly see where the threat occurs if you implement your own agent, or just study the theory of that implementation, as described in previous HN submissions like https://news.ycombinator.com/item?id=46545620 and https://news.ycombinator.com/item?id=45840088 .

	▲	ben_w 10 hours ago \| parent \| next [-]
		You seem to be saying "I want all the benefits of YOLO mode without YOLO mode". You can just… use the normal mode if you want more security, it asks for permission for things. > Prompt injection is a technique for getting the LLM to output text that is dangerous when interpreted by the agent system, for example, "tool use requests" that propose to run a malicious Bash command. One of the things Claude can do is write its own tools, even its own programming languages. There's no fundamental way to make it impossible to run something dangerous, there is only trust. It's remarkable that these models are now good enough that people can get away with trusting them like this. But, as Simon has himself said on other occasions, this is "normalisation of deviance". I'm rather the opposite: as I have minimal security experience but also have a few decades of watching news about corporations suffering leaks, I am absolutely not willing to run in YOLO mode at this point, even though I already have an entirely separate machine for claude with the bare minimum of other things logged in, to the extent that it's a separate github account specifically for untrusted devices.
	▲	runako 14 hours ago \| parent \| prev [-]
		> propose to run a malicious Bash command I am not sure it is reasonably possible to determine which Bash commands are malicious. This is especially so given the multitude of exploits latent in the systems & software to which Bash will have access in order to do its job. It's tough to even define "malicious" in a general-purpose way here, given the risk tolerances and types of systems where agents run (e.g. dedicated, container, naked, etc.). A Bash command could be malicious if run naked on my laptop and totally fine if run on a dedicated machine.

▲

simonw 15 hours ago | parent | prev | next [-]

Because if you give an agent Bash it can do anything they can be achieved by running commands in Bash, which is almost anything.

▲

zahlman 15 hours ago | parent | next [-]

Yes. My proposal is to not give the agent Bash, because it is not required for the sorts of things you want it to be able to do. You can whitelist specific actions, like git commits and file writes within a specific directory. If the LLM proposes to read a URL, that doesn't require arbitrary code; it requires a system that can validate the URL, construct a `curl` etc. command itself, and pipe data to the LLM.

▲

runako 14 hours ago | parent | next [-]

> whitelist specific actions

> file writes

> construct a `curl`

I am not a security researcher, but this combination does not align with "safe" to me.

More practically, if you are using a coding agent, you explicitly want it to be able to write new code and execute that code (how else can it iterate?). So even if you block Bash, you still need to give it access to a language runtime, and that language runtime can do ~everything Bash can do. Piping data to and from the LLM, without a runtime, is a totally different, and much limited, way of using LLMs to write code.

	▲	zahlman 3 hours ago \| parent [-]
		> write new code and execute that code (how else can it iterate?) Yeah, this is the point where I'd want to keep a human in the loop. Because you'd do that if you were pair programming with a human on the same computer, right?

▲

14 hours ago | parent | prev | next [-]

[deleted]

▲

adastra22 14 hours ago | parent | prev | next [-]

It is very much required for the sorts of things I want to do. In any case, if you deny the agent the bash tool, it will just write a Python script to do what it wanted instead.

▲

MrDarcy 14 hours ago | parent | prev | next [-]

Go for it. They have allow and deny lists.

▲

simonw 14 hours ago | parent | prev [-]

That's a great deal of work to get an agent that's a whole lot less capable.

Much better to allow full Bash but run in a sandbox that controls file and network access.

▲

bsimpson 13 hours ago | parent | prev [-]

Agents know that.

> ReadFile ../other-project/thing

> Oh, I'm jailed by default and can't read other-project. I'll cat what I want instead

> !cat ../other-project/thing

It's surreal how often they ask you to run a command they could easily run, and how often they run into their own guardrails and circumvent them

▲

VTimofeenko 15 hours ago | parent | prev | next [-]

Tools may become dangerous due to a combination of flags. `ln -sf /dev/null /my-file` will make that file empty (not really, but that's beside the point).

▲

zahlman 14 hours ago | parent [-]

Yes. My proposal is that the part of the system that actually executes the command, instead of trying to parse the LLM's proposed command and validate/quote/escape/etc. it, should expose an API that only includes safe actions. The LLM says "I want to create a symbolic link from foo to bar" and the agent ensures that both ends of that are on the accept list and then writes the command itself. The LLM says "I want to run this cryptic Bash command" and the agent says "sorry, I have no idea what you mean, what's Bash?".

▲

LudwigNagasena 14 hours ago | parent [-]

That's a distinction without a difference, in the end you still have an arbitrary bash command that you have to validate.

And it is simply easier to whitelist directories than individual commands. Unix utilities weren't created with fine-grained capabilities and permissions in mind. Wherever you add a new script or utility to a whitelist, you have to actively think whether any new combination may lead to privileges escalation or unintended effects.

	▲	zahlman 3 hours ago \| parent [-]
		> That's a distinction without a difference, in the end you still have an arbitrary bash command that you have to validate. No, you don't. You have a command generated by auditable, conventional code (in the agent wrapper) rather than by a neural network.

▲

lilEndiansGame 15 hours ago | parent | prev | next [-]

Because the OS already provides data security and redundancy features. Why reimplement?

Use the original container, the OS user, chown, chmod, and run agents on copies of original data.

▲

cindyllm 15 hours ago | parent | prev [-]

[dead]

▲

pjm331 15 hours ago | parent | prev | next [-]

I feel like you can get 80% of the benefits and none of the risks with just accept edits mode and some whitelisted bash commands for running tests, etc.

	▲	vidarh 10 hours ago \| parent [-]
		This is functionally equivalent to auto approving all bash commands, unless you prevent those tests from shelling put to bash.

▲

catlifeonmars 15 hours ago | parent | prev | next [-]

Shouldn’t companies like Anthropic be on the hook for creating tools that default to running YOLO mode securely? Why is it up to 3rd parties to add safety to their products?

▲

croes 15 hours ago | parent | prev [-]

> Because we've judged it to be worth it!

Famous last words

▲

catlifeonmars 15 hours ago | parent | prev [-]

People really really want to juggle chainsaws, so have to keep coming up with thicker and thicker gloves.

▲

solumunus 13 hours ago | parent [-]

The alternative is dropping them and then doing less work, earning less money and having less fun. So yes, we will find a way.

	▲	catlifeonmars 4 hours ago \| parent [-]
		Or just holding the tool the way it’s meant to be held :) I’ll stop torturing the analogy now, but what I mean by that is that you can use the tools productively and safely. The insistence on running everything as the same user seems unnecessary. It’s like an X-Y problem. Really this is on the tool makers (looking at you Anthropic) not prioritizing security by default so the users can just use the tools without getting burned and without losing velocity.