Remix.run Logo
Coding Agent VMs on NixOS with Microvm.nix(michael.stapelberg.ch)
45 points by secure 3 days ago | 14 comments
the_harpia_io 2 hours ago | parent | next [-]

The sandbox-or-not debate is important but it's only half the picture. Even a perfectly sandboxed agent can still generate code with vulnerabilities that get deployed to production - SQL injection, path traversal, hardcoded secrets, overly permissive package imports.

The execution sandbox stops the agent from breaking out during development, but the real risk is what gets shipped downstream. Seeing more tools now that scan the generated code itself, not just contain the execution environment.

NJL3000 20 minutes ago | parent | prev | next [-]

A pair of containers felt a bit cheaper than a VM:

https://github.com/5L-Labs/amp_in_a_box

I was going to add Gemini / OpenCode Kilo next.

There is some upfront cost to define what endpoints to map inside, but it definitely adds a veneer of preventing the crazy…

0xcb0 35 minutes ago | parent | prev | next [-]

I was looking for a way to isolate my agents in a more convenient way, and I really love your idea. I'm going to give this a try over the weekend and will report back.

But the one-time setup seems like a really fair investment for having a more secure development. Of course, what concerns the problem of getting malicious code to production, this will not help. But this will, with a little overhead, I think, really make development locally much more secure.

And you can automate it a lot. And it will be finally my chance to get more into NixOS :D

rootnod3 3 hours ago | parent | prev | next [-]

That is quite an involved setup to get a costly autocomplete going.

Is that really where we are at? Just outsource convenience to a few big players that can afford the hardware? Just to save on typing and god forbid…thinking?

“Sorry boss, I can’t write code because cloudflare is down.”

Cyph0n an hour ago | parent [-]

Keep in mind that this setup is a one-time cost. Also, a lot of the code is related to configuring it the way the author wants it (via Home Manager).

Generally speaking, once you have a working NixOS config, incremental changes become extremely trivial, safe, and easy to rollback.

heliumtera 2 hours ago | parent | prev | next [-]

Couldn't you replicate all of your setup with qemu microvm?

Without nix I mean

rictic 2 hours ago | parent [-]

Yep. What nix adds is a declarative and reproducible way to build customized OS images to boot into.

CuriouslyC 42 minutes ago | parent [-]

Nix is the best answer to "works on my machine," which is a problem I've seen at pretty much every place I've ever worked.

clawsyndicate 3 days ago | parent | prev [-]

we run ~10k agent pods on k3s and went with gvisor over microvms purely for density. the memory overhead of a dedicated kernel per tenant just doesn't scale when you're trying to pack thousands of instances onto a few nodes. strict network policies and pid limits cover most of the isolation gaps anyway.

secure 3 days ago | parent | next [-]

Yeah, when you run ≈10k agents instead of ≈10, you need a different solution :)

I’m curious what gVisor is getting you in your setup — of course gVisor is good for running untrusted code, but would you say that gVisor prevents issues that would otherwise make the agent break out of the kubernetes pod? Like, do you have examples you’ve observed where gVisor has saved the day?

zeroxfe 2 hours ago | parent | next [-]

I've used both gVisor and microvms for this (at very large scales), and there are various tradeoffs between the two.

The huge gVisor drawback is that it __drastically_ slows down applications (despite startup time being faster.)

For agents, the startup time latency is less of an issue than the runtime cost, so microvms perform a lot better. If you're doing this in kube, then there's a bunch of other challenges to deal with if you want standard k8s features, but if you're just looking for isolated sandboxes for agents, microvms work really well.

clawsyndicate 3 days ago | parent | prev [-]

since we allow agents to execute arbitrary python, we treat every container as hostile. we've definitely seen logs of agents trying to crawl /proc or hit the k8s metadata api. gvisor intercepts those syscalls so they never actually reach the host kernel.

rootnod3 2 hours ago | parent [-]

And you see no problem in that at all? Just “throw a box around it and let the potentially malicious code run”?

Wait until they find a hole. Then good luck.

dist-epoch 2 hours ago | parent | prev [-]

LXC containers inside a VM scales. bonus point that LXC containers feel like a VM.