Remix.run Logo
shiandow 2 hours ago

For a company that puts DO NOT FUCKING GUESS in their instructions they made a heck of a lot of assumptions

- assume tokens are scoped (despite this apparently not even being an existing feature?)

- assume an LLM didn't have access

- assume an LLM wouldn't do something destructive given the power

- assume backups were stored somewhere else (to anyone reading, if you don't know where they are, you're making the same assumption)

Also you should never give LLMs instructions that rely on metacognition. You can tell them not to guess but they have no internal monologue, they cannot know anything. They also cannot plan to do something destructive so telling then to ask first is pointless. A text completion will only have the information that they are writing something destructive afterwards.

gwerbin a few seconds ago | parent | next [-]

The thing that seems to bring up these extremely unlikely destructive token sequences and it totally seems to be letting agents just run for a long time. I wonder if some kind of weird subliminal chaos signal develops in the context when the AI repeatedly consumes its own output.

Personally I don't even let my agent run a single shell command without asking for approval. That's partly because I haven't set up a sandbox yet, but even with a sandbox there is a huge "hazard surface" to be mindful of.

I wonder if AI agent harnesses should have some kind of built-in safety measure where instead of simply compacting context and proceeding, they actually shut down the agent and restart it.

That said I also think even the most advanced agents generate code that I would never want to base a business on, so the whole thing seems ridiculous to me. This article has the same energy as losing money on NFTs.

coalstartprob an hour ago | parent | prev [-]

it's like telling a kid DO NOT STEAL FROM THE COOKIE JAR lmao