Remix.run Logo
OsrsNeedsf2P 5 hours ago

Right, but were they using the same methodology and harness? I'm skeptical that they're doing something with the harness - i.e. with Mythos, they pass each file in one at a time, whereas on 4.6 they let Claude Code run loose to find bugs. This would have a larger impact difference than the model itself.

ZrArm 3 hours ago | parent | next [-]

From Mozilla post [1]:

"...After fixing the initial set of issues that Anthropic sent to us in February, we built our own harness atop our existing fuzzing infrastructure.

We began with small-scale experiments prompting the harness to look for sandbox escapes with Claude Opus 4.6. Even with this model, we identified an impressive amount of previously-unknown vulnerabilities which required complex reasoning over multiprocess browser engine code..."

So yeah, Anthropic and Mozilla likely compare "Amount of bugs found by Opus 4.6 during early experiments" vs "Amount of bugs found by Mythos during large-scale codebase scanning".

[1] https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...

mpyne 4 hours ago | parent | prev [-]

Yes, the harness they used actually existed and was in use beforehand, it wasn't developed for testing with Mythos.