Remix.run Logo
jml78 3 hours ago

At my job we have tooling that scans our code repos with Opus. Yes it can find stuff however it doesn’t find everything.

I am able to get Opus and Sonnet to function as a red team agent. We don’t have some crazy special sauce, just a lot of trial and error. Basically add enough context proving we own the code and running services that it will run attempts to compromise our services.

It found tons of stuff that was not found with just scanning the code. It found serious security issues that had been in productions for years that humans never found. They weren’t things that were accessible externally but serious enough that we are thrilled to have these tools.

I can say that Fable did refuse to function with our harness. I am worried that soon you have to be in the special club to do this stuff with the SOTA models. A small company like ours doesn’t get accepted to their programs that remove guardrails. Even though our CEO has found and disclosed vulnerabilities to multiple companies and holds a patent around federated authentication.