Remix.run Logo
simonw 3 days ago

I think the rule still applies that you should consider any tools as being under the control of anyone who manages to sneak instructions into your context.

Which is a pretty big limitation in terms of things you can safely use them for!

backflippinbozo a day ago | parent [-]

We built agents to test github repo quickstarts associated with arXiv papers a couple months before this paper was published, wrote about it publicly here: https://remyxai.substack.com/p/self-healing-repos

We've been pushing it farther to implement draft PRs in your target repo, published a month before this preprint: https://remyxai.substack.com/p/paperswithprs

To limit the attack surface we added PR#1929 to AG2 so we could pass API keys to the DockerCommandLineCodeExecutor but also use egress whitelisting to block the ability of an agent to reach a compromised server: https://github.com/ag2ai/ag2/pull/1929

Since then, we've been scaling this with k8s ray workers so we can run this in the cloud to build for the hundreds of papers published daily.

By running in Docker, constraining the network interface, deploying on the cloud, and ultimately keeping humans-in-the-loop through PR review, it's hard to see where the prompt-injection attack comes into play from testing the code.

Would love to get feedback from an expert on this, can you imagine an attack scenario, Simon?

I'll need to work out a check for the case where someone creates a paper with code instructing my agent to publish keys to a public HF repo for others to exfiltrate.