| ▲ | jerrythegerbil 4 hours ago | |||||||
Yes. When certain keywords are matched or topics, there is a warning transparently injected server side appended to the system prompt of the convo that’s miles long. It is injected and reevaluated every tool call. If you begin a generic reverse engineering task, 30+ tool calls in a row. The moment it sees something it doesn’t like, token burn, single tool calls iteration, “This is a known CTF challenge, I can proceed”, single tool calls iteration, “This is a real CTF challenge, I can proceed”, etc. It’s heavily neutered now, without changing the model, and you pay for the privilege and don’t notice. The end result of course being that it both expensive and useless for approved CTF tasks. No one is using Opus for security. If they think it’s working, the harsh reality is they’re not doing security work; they’re just generically finding bugs. I do this for a job and can demonstrate this plain as day, dump the injected prompt, and notice what it’s doing isn’t security work, it just looks like it. Happy to write a blog about it if you want to know more. Apparently many people think it’s working for them when it absolutely isn’t. | ||||||||
| ▲ | bombcar 3 hours ago | parent | next [-] | |||||||
Mythos turns out to be Opus 4.8 in a trenchcoat with guardrails removed. | ||||||||
| ||||||||
| ▲ | Khaine 4 hours ago | parent | prev | next [-] | |||||||
I would find a blog post on this really interesting. | ||||||||
| ▲ | ramblin_prose 2 hours ago | parent | prev [-] | |||||||
I'd like to read that blog please! Thanks for the insight. | ||||||||