▲ | simonw 20 hours ago | |||||||||||||||||||||||||||||||||||||||||||||||||
You mean instead of them running the code that they are writing they pretend to run the code and the model shows what it thinks would happen? I don't like that at all. Actually running the code is the single most effective protection we have against coding mistakes, from both humans and machines. I think it's absolutely worth the complexity and performance overhead of hooking up a real container environment. Not to mention you can run a useful code execution container in 100MB of RAM on a single CPU (or slice thereof). Simulating that with an LLM takes at least one GPU and 100GB or more of VRAM. | ||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | lvl155 19 hours ago | parent [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
I understand your point but I basically find myself running all my agents in barebones containers and they’re basically short-run make-or-kill types. And once we ramp up agent counts, possibly into the thousands, that could add up rapidly. Of course, you would run milestone tests on actual container/envs but I think there might be a need for lighter solutions for rapid agent dev runs. | ||||||||||||||||||||||||||||||||||||||||||||||||||
|