| ▲ | simonw 20 hours ago |
| I got a WebAssembly build of this working and fired up a web playground for trying it out: https://simonw.github.io/research/monty-wasm-pyodide/demo.ht... It doesn't have class support yet! But it doesn't matter, because LLMs that try to use a class will get an error message and rewrite their code to not use classes instead. Notes on how I got the WASM build working here: https://simonwillison.net/2026/Feb/6/pydantic-monty/ |
|
| ▲ | jstanley 8 hours ago | parent | next [-] |
| > But it doesn't matter, because LLMs that try to use a class will get an error message and rewrite their code to not use classes instead. This is true in a sense, but every little papercut at the lower levels of abstraction degrades performance at higher levels as the LLM needs to spend its efforts on hacking around jank in the Python interpreter instead of solving the real problem. |
| |
| ▲ | qwertox 6 hours ago | parent [-] | | It is a workaround, so we can assume that this will be temporary and in the future the ai will then start using them once it can. Probably just like we would do. | | |
| ▲ | cyanydeez 3 hours ago | parent [-] | | Thw entire AI stack is built on a lot of "assumes" about intelligent selection. Reminds of evolutionary debate. Whats important is just because something can learn to adapt doesnt mean theyll find an optimized adaption, nor will they continually refine it. As far as i can tell AI will only solve problems well where the problem space is properly defined. Most people wont know how to do that. |
|
|
|
| ▲ | vghaisas 10 hours ago | parent | prev | next [-] |
| This is very cool, but I'm having some trouble understanding the use cases. Is this mostly just for codemode where the MCP calls instead go through a Monty function call? Is it to do some quick maths or pre/post-processing to answer queries? Or maybe to implement CaMeL? It feels like the power of terminal agents is partly because they can access the network/filesystem, and so sandboxed containers are a natural extension? |
| |
| ▲ | 16bitvoid 9 hours ago | parent [-] | | It's right there in the README. > Monty avoids the cost, latency, complexity and general faff of using full container based sandbox for running LLM generated code. > Instead, it let's you safely run Python code written by an LLM embedded in your agent, with startup times measured in single digit microseconds not hundreds of milliseconds. | | |
| ▲ | vghaisas 4 hours ago | parent [-] | | Oh I did read the README, but still have the question: while it does save on cost, latency and complexity, the tradeoff is that the agents can't run whatever they want in a sandbox, which would make them less capable too. |
|
|
|
| ▲ | saberience 8 hours ago | parent | prev | next [-] |
| I really don't understand the use-case here. My models are writing code all day in 3/4 different languages, why would I want to: a) Restrict them to Python b) Restrict them to a cutdown, less-useful version of Python? My models write me Typescript and C# and Python all day with zero issues. Why do I need this? |
| |
| ▲ | srcreigh 15 minutes ago | parent | next [-] | | It’s a sandbox. If your model generates and runs a script for each email in your inbox and has access to sensitive information, you want to make sure it can’t communicate externally. | |
| ▲ | falcor84 4 hours ago | parent | prev | next [-] | | For extremely rapid iteration - they can run a quick script with this in under 1ms - it removes a significant bottleneck, especially for math-heavy reasoning | | |
| ▲ | bonoboTP 2 hours ago | parent [-] | | Not sure if I get it, but it seems to me that this is not for "producing code" eg for your projects or doing things on your computer but essentially for supplementing its own thinking process. It runs this python code to count how many letters R in strawberry if you ask that, or does quick math, quick sorting and simple well defined tasks like this that are needed for answering the query or doing the job you asked to do. It's not indended to be read by the user and it's not a "deliverable" for the user. | | |
| ▲ | falcor84 an hour ago | parent [-] | | I'm working on a system where user requests queue up agentic runs that do a bit of analytical reasoning and then return a response to the user, and this interpreter can possibly help me significantly reduce the runtime of these jobs. |
|
| |
| ▲ | zahlman 4 hours ago | parent | prev [-] | | For sandboxing, as described in the README. |
|
|
| ▲ | otabdeveloper4 8 hours ago | parent | prev | next [-] |
| > and rewrite their code to not use classes instead Only if the training data has enough Python code that doesn't use classes. (We're in luck that these things are trained on Stackoverflow code snippets.) |
|
| ▲ | yikebfhw 17 hours ago | parent | prev | next [-] |
| [flagged] |
|
| ▲ | dhdjfhfjfn 16 hours ago | parent | prev | next [-] |
| [flagged] |
| |
| ▲ | simonw 16 hours ago | parent [-] | | You're really stretching things here to classify me pointing out that LLMs can handle syntax errors caused by partial implementations of Python as "being a vapid propagandist". (This kind of extremely weak criticism often seems to come from newly created Hacker News accounts, which makes me wonder if it's mostly the same person using sockpuppets.) | | |
| ▲ | johnfn 14 hours ago | parent [-] | | Sorry for this, Simon. But just know that this non-newly-created hacker news account does not think you are a “vapid propagandist” and appreciates your content. |
|
|
|
| ▲ | ujjaog72 7 hours ago | parent | prev | next [-] |
| [flagged] |
| |
| ▲ | rob 6 hours ago | parent [-] | | Warning: another fake troll account just created for this comment. The same one left a comment last night on a new account under Simon's comment as well but was flagged. |
|
|
| ▲ | issat982 17 hours ago | parent | prev [-] |
| [flagged] |