This is one of the reasons I'm so interested in sandboxing. A great way to reduce the need for review is to have ways of running code that limit the blast radius if the code is bad. Running code in a sandbox can mean that the worst that can happen is a bad output as opposed to a memory leak, security hole or worse.

▲

MeetingsBrowser 6 hours ago | parent | next [-]

Isn’t “bad output” already worst case? Pre-LLMs correct output was table stakes.

You expect your calculator to always give correct answers, your bank to always transfer your money correctly, and so on.

	▲	swiftcoder 30 minutes ago \| parent [-]
		> Isn’t “bad output” already worst case? Worst case in a modern agentic scenario is more like "drained your bank account to buy bitcoin and then deleted your harddrive along with the private key" > Pre-LLMs correct output was table stakes We're only just getting to the point where we have languages and tooling that can reliably prevent segfaults. Correctness isn't even on the table, outside of a few (mostly academic) contexts

▲

KnuthIsGod 6 hours ago | parent | prev [-]

And if the bad output leads to a decision maker making a bad decision, that takes down your company or kills your relative ?

	▲	riffraff 5 hours ago \| parent [-]
		The sandbox in question was to absorb shrapnel from explosions, clearly