Remix.run Logo
rst 11 hours ago

Most of the time, the agent should be able to run the code and observe the errors for itself, but there are exceptions. For instance, I've had agents write code that's used to process data which, by company policy, can't be exposed to cloud services (confidential customer communications, etc.), a prohibition that includes cloud-hosted LLMs. When that blows up, I've had to give it a bug report -- what I do then to avoid excessive back-and-forth is to package it up well enough that the bot can reproduce the failure on sanitized excerpts and produce a fix autonomously using that.