Remix.run Logo
skybrian 5 hours ago

A year ago the LLM's weren't good enough to find these security issues. They could have done other stuff. But then again, the big tech companies were already doing other stuff, with bug bounties, fuzzing, rewriting key libraries, and so on.

This initiative probably could have started a few months sooner with Opus and similar models, though.

adrian_b 4 hours ago | parent | next [-]

Using multiple older open weights models can find all the security issues that have been found by Mythos.

However, no single model of those could find everything that was found by Mythos.

https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag...

Nevertheless, the distance between free models and Mythos is not so great as claimed by the Anthropic marketing, which of course is not surprising.

In general, this is expected to be also true for other applications, because no single model is equally good for everything, even the SOTA models, trying multiple models may be necessary for obtaining the best results, but with open weights models trying many of them may add negligible cost, especially if they are hosted locally.

causal 5 hours ago | parent | prev | next [-]

That's not quite true, even a year ago LLMs were finding vulnerabilities, especially when paired with an agent harness and lots of compute. And even before that security researchers have been shouting about systemic fragility.

Mythos certainly represents a big increase in exploitation capability, and we should have anticipated this coming.

Analemma_ 5 hours ago | parent [-]

A lot of those bugs were found by seasoned developers and security professionals though. Anthropic claims that Mythos is finding vulns from people who have no security background, who just typed "hey, go find a vulnerability in X", went home for the night, and came back the next morning with a PoC ready. They could definitely be an exaggerating, but if it's true that's a very different threat category which is worth paying attention to.

qingcharles 5 hours ago | parent | next [-]

Previous models have done this just fine. For the last year, whenever a new model has come out I just point it at some of my repos and say something like "scan this entire codebase, look for bugs, overengineering, security flaws etc" and they always find a few useful things. Obviously each new model does this better than the last, though.

causal 5 hours ago | parent | prev [-]

Yes, previous models found vulnerabilities but Mythos is uniquely capable of actually exploiting them: https://red.anthropic.com/2026/mythos-preview/

pxc 5 hours ago | parent [-]

Imo that's a big deal primarily because the issue with automatically discerned vulnerabilities has long been a high volume of reports and a very bad signal-to-noise ratio. When an LLM is capable of developing PoC exploits, that means you finally have a tool that enables meaningfully triaging reports like this.

pixel_popping 5 hours ago | parent | prev | next [-]

If you run Opus 4.6 and GPT 5.4 in a loop right now (maybe 100 times) against top XXXX repos, I guarantee you that you'll find at the very least, medium vulnerabilities.

alephnerd 4 hours ago | parent | prev | next [-]

> A year ago the LLM's weren't good enough to find these security issues

I know of two F100s that already started using foundation models for SCA in tandem with other products back in 2024. It's noisy, but a false positive is less harmful than an undetected true positive depending on the environment.

vonneumannstan 5 hours ago | parent | prev [-]

>This initiative probably could have started a few months sooner with Opus and similar models, though.

Evidently they tried and even the most recent Opus 4.6 models couldn't find much. Theres been a step change in capabilities here.

causal 5 hours ago | parent [-]

No, Opus has found a lot and 112 vulnerabilities were reported to Firefox alone by Opus [0]. But Mythos is uniquely capable of exploiting vulnerabilities, not just finding them.

[0] https://red.anthropic.com/2026/mythos-preview/