This is a really interesting take on the sandboxing problem. This reminds me of an experiment I worked on a while back (https://github.com/imfing/jsrun), which embedded V8 into Python to allow running JavaScript with tightly controlled access to the host environment. Similar in goal to run untrusted code in Python.

I’m especially curious about where the Pydantic team wants to take Monty. The minimal-interpreter approach feels like a good starting point for AI workloads, but the long tail of Python semantics is brutal. There is a trade-off between keeping the surface area small (for security and predictability) and providing sufficient language capabilities to handle non-trivial snippets that LLMs generate to do complex tasks

▲

scolvin 18 hours ago | parent | next [-]

Can't be sure where this might end, but the primary goal is to enable codemode/programmatic tool calling, using the external function call mechanism for anything more complicated.

I think in the near term we'll add support for classes, dataclasses, datetime, json. I think that should be enough for many use cases.

▲

ushakov 18 hours ago | parent | prev [-]

there’s no way around VMs for secure, untrusted workloads. everything else, like Monty has too many tradeoffs that makes it non-viable for any real workloads

disclaimer: i work at E2B, opinions my own

▲

scolvin 18 hours ago | parent [-]

As discussed on twitter, v8 shows that's not true.

But to be clear, we're not even targeting the same "computer use" use case I think e2b, daytona, cloudflare, modal, fly.io, deno, google, aws are going after - we're aiming to support programmatic tool calling with minimal latency and complexity - it's a fundamentally different offering.

Chill, e2b has its use case, at least for now.

▲

fulafel 14 hours ago | parent | next [-]

There's been a constant stream of v8 VM sandbox escape discoveries since its dawn of course. Considering those have mostly existed for a long time before publication it's very porous most of the time.

And Python VM had/has its sandboxing features too, previously rexec and still https://github.com/zopefoundation/RestrictedPython - in the same category I'd argue.

Then there's of course hypervisor based virtualization and the vulnerabilities and VM escapes there.

Browsers use belt-and-suspenders approaches of employing both language runtime VMs and hardware memory protection as layers to some effect, but still are the star act at pwn2own etc.

It's all layers of porous defenses. There'd definitely be room in the world for performant dynamic language implementations with provably secure foundations.

▲

semi-extrinsic 10 hours ago | parent | next [-]

> It's all layers of porous defenses.

Also known as the "swiss cheese model" in risk management.

▲

eichin 13 hours ago | parent | prev [-]

part of why rexec is "historical" is that Guido was looking at some lockdown work and asked (twitter, probably?) the community to come up with attack ideas (on a specific more-locked-down-than-default proposed version.) After a couple of hours, it was clear that "patching the problems" was entirely doomed given how flexible python is and it was better to do something else entirely and stop pretending...

	▲	11 hours ago \| parent [-]
		[deleted]

▲

staticassertion 4 hours ago | parent | prev | next [-]

V8 itself is intended to be heavily sandboxed. Not through a microvm, but otherwise it's probably the most heavily sandboxed piece of code ever ie: in Chrome it can make virtually no system calls and runs with every restriction an OS can possibly provide and more and seccomp-bpf was basically invented for it.

Perhaps you're using v8 isolates, which then you're back into the "heavily restricted environment within the process" and you lose the things you'd want your AI to be able to do, and even then you still have to sandbox the hell out of it to be safe and you have to seriously consider side channel leaks.

And even after all of that you'd better hope you're staying up to date with patches.

MicroVMs are going to just be way simpler IMO. I don't really get the appeal of using V8 for this unless you have platform/ deployment limitations. Talking over Firecracker's vsock is extremely fast. Firecracker is also insanely safe - 3 CVEs ever, and IMO none are exploitable.

▲

ushakov 18 hours ago | parent | prev [-]

we’re not disagreeing here - i meant for general use-case VMs are better, for some application-specific calls Monty this might suffice.

although you’d still need another boundary to run your app in to prevent breaking out to other tenants.