There are two primary issues to solve:

1: Protecting against bad things (prompt injections, overeager agents, etc)

2: Containing the blast radius (preventing agents from even reaching sensitive things)

The companies building the agents make a best-effort attempt against #1 (guardrails, permissions, etc), and nothing against #2. It's why I use https://github.com/kstenerud/yoloai for everything now.

▲ AbanoubRodolf 3 hours ago | parent [-]

The blast radius problem is the one that actually gets exploited. Prompt injection defenses are fighting the model's core training to be helpful, so you're always playing catch-up. Blast radius reduction is a real engineering problem with actual solutions and almost nobody applies them before something goes wrong.

The clearest example is in agent/tool configs. The standard setup grants filesystem write access across the whole working directory plus shell execution, because that's what the scaffolding demos need. Scoping down to exactly what the agent needs requires thinking through the permission model before deployment, which most devs skip.

A model that can only read specific directories and write to a staging area can still do 90% of the useful work. Any injection that lands just doesn't reach anything sensitive.

▲ kstenerud 2 hours ago | parent [-]

I've gone a step further:

- yoloai new mybugfix . -a # start a new sandbox using a copy of CWD as its workdir

- # tell the agent to fix the broken thing

- yoloai diff mybugfix # See a unified diff of what it did with its copy of the workdir

- yoloai apply mybugfix # apply specific git commits it made to the real workdir, or the whole diff - your choice

- yoloai destroy mybugfix

The diff/apply makes sure that the agent has NO write access to ANYTHING sensitive, INCLUDING your workdir. You decide what gets applied AFTER you review what crazy shit it did in its sandbox copy of your workdir.

Blast radius = 0

▲ throwaway290 2 hours ago | parent [-]

But then you give the llm access to all internet and any other tokens it needs right?;)

▲ kstenerud 2 hours ago | parent [-]

You can configure a network allow-list (for anything beyond what it absolutely requires in order to function).

yoloAI is just leveraging the sandboxing functionality that Docker, Kata, firecracker etc already provides.

▲ throwaway290 an hour ago | parent [-]

sorry. At this point it's just a meme how people give llms open access to internet, literally all passwords and all tokens and then they are actually surprised when something bad happens "but I run it in docker"

even if docker sandbox escapes didn't exist it's just chef's kiss

	▲	kstenerud 31 minutes ago \| parent [-]
		Yup, very irresponsible. And then the horror stories. `yoloai new --network-isolated ...` ONLY agent API traffic allowed. Everything else gets blocked by iptables. `yoloai new --network-allow api.example.com --network-allow cdn.example.org ...` ONLY agent API traffic + api.example.com and cdn.example.org. Everything else blocked by iptables.