Remix.run Logo
mukmuk 3 hours ago

I’m not sure how to reconcile anthropic’s update / some of the exuberant comments here with recent feedback like the following from curl maintainer Daniel Steinberg:

“I see no evidence that this setup [Mythos] finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing.”

https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-v...

moomin 2 hours ago | parent | next [-]

You’re right, it’s a valid data point. But the U.K. government report is also a data point, and the Firefox report is a data point, and they suggest that it is, indeed, significantly better than current generation models. Maybe curl is significantly better hardened than most projects?

In any event, it barely matters. As Anthropic acknowledges, next level models are comings, theirs is only one of them. Current generation models are already good at things like tracing data flow through complex systems and there’s no reason to think that capability has topped out. So within a year it seems very likely we’ll have more than one commercially available model able to find vulnerabilities cheaply.

On the other hand, it seems that they’ve made much less progress on getting it to design solutions to these issues.

ZrArm 2 hours ago | parent [-]

> Maybe curl is significantly better hardened than most projects?

Meanwhile from [1]:

"Not even half-way through this #curl release cycle we are already at 11 confirmed vulnerabilities - and there are three left in the queue to assess and new reports keep arriving at a pace of more than one/day."

"The simple reason is: the (AI powered) tools are this good now. And people use these tools against curl source code.They find lots of new problems no one detected before. And none of these new ones used Mythos. Focusing on Mythos is a distraction - there are plenty of good models, and people who can figure out how to get those models and tools to find things."

Yeah, it looks like there are at least 11 security bugs missed by Mythos.

[1] https://www.linkedin.com/feed/update/urn:li:activity:7463481...

solenoid0937 2 hours ago | parent [-]

I don't think anyone has claimed that Mythos finds all vulns in all projects. But it's very good if Mozilla's blog posts are anything to go by.

dannyobrien 2 hours ago | parent | prev | next [-]

I think people sometimes misunderstand Daniel's point here, though it's clearer when taken in context of the rest of his article. The tools in general are getting a lot better at finding security bugs, it was unclear to Daniel based on his usage whether Mythos in particular is a huge step, but the Mythos generation of LLMs definitely are. Note though that Daniel was using Mythos somewhat indirectly. One thing I've taken away from the whole Mythos debate is that a) I suspect that Anthropic's GPU crunch meant that they felt they had to ration Mythos access anyway, so the calculus of whether they would release it generally was probably influenced by that, and b) finding bugs with Mythos or a similar model is still expensive -- a $20K or $100K Mythos run on Curl might have shown the same level of issues as other projects like Firefox, but Daniel didn't get that kind of access.

He posted a general update today on LinkedIn which I think gives the wider context:

https://www.linkedin.com/feed/update/urn:li:activity:7463481...

> Not even half-way through this hashtag#curl release cycle we are already at 11 confirmed vulnerabilities - and there are three left in the queue to assess and new reports keep arriving at a pace of more than one/day.

> 11 CVEs announced in a single release is our record from 2016 after the first-ever security audit (by Cure 53).

> This is the most intense period in hashtag#curl that I can remember ever been through.

skybrian 3 hours ago | parent | prev | next [-]

Different people can have different experiences without contradiction. Maybe the curl source code was pretty clean to begin with?

dreambigwrkhard 2 hours ago | parent [-]

imo curl is quite well maintained. there are a lot of sloppy projects out there and tools like this shows whos been swimming with their pants down. not saying any project with vulnerabilities are sloppy but when costs of finding bugs and vulnerabilities decrease significantly, they will get exposed with enough time and tokens ($)

kadoban 2 hours ago | parent | prev | next [-]

Curl has more eyes on it, and has had more tools thrown at it, and is better tested (and developed?) than 99% of software, it's very much not the norm. I wouldn't be surprised if that has something to do with it, if there is any kind of bias there (not sure if there is, it's also possible he's just right).

mayneack 2 hours ago | parent | prev | next [-]

Daniel has been posting for months (years?) about how much scrutiny he gets from security researchers and various automated tools. I wouldn't expect curl to be the average case for mythos.

3kahg 2 hours ago | parent [-]

It is the opposite. Security people focus on curl, sudo because they are code bases that contained a lot of features and unused code from the 1990s.

They don't focus on projects where they find nothing. They certainly don't advertise when they find nothing.

Getting a lot of scrutiny is not the recommendation that it appears to be. What is the new standard? Projects that never have bugs are deemed to be suspect because they "have not been scrutinized" (they have, but null results never go public)?

So Mythos only finding one issue after other tools have found 300 this year is embarrassing. Mythos was supposed to be better and novel.

tptacek 2 hours ago | parent [-]

It is definitely not the case that curl has been or is now a marquee vulnerability research target. It's a CLI HTTP fetcher. It's the same with sudo. It's a big deal if a sudo vulnerability gets found, because it's an extremely load-bearing piece of software, but sudo is itself not a prime target, because it doesn't do much.

43ahg 2 hours ago | parent [-]

There is no claim that it is a "vulnerability research target". It is a bug finding magnet, and bugs can be found by anything from gcc warnings to AI tools.

No, it didn't attract a bluepill exploit research.

The fact that 300 bugs found in a year is not a recommendation as the pro-AI mafia suddenly claims ("because it has been analyzed!") still stands. Maybe the AI-mafia should sell "analyzed by Mythos" labels to impress people who don't write public software or find bugs for that matter.

tptacek an hour ago | parent [-]

What’s a “bluepill exploit”?

aSJH1 an hour ago | parent [-]

An exploit of the magnitude or impact of this one:

https://en.wikipedia.org/wiki/Blue_Pill_(software)

Now, since you are a literalist, you'll come up with some other nitpick and gain another 1000 Internet points from the AI people. Perhaps a comma is missing somewhere.

enraged_camel an hour ago | parent [-]

Did you... create a new account just to be able to respond to Thomas?

Btw, he's a security researcher. You should be more respectful.

1248wu 27 minutes ago | parent [-]

And enraged_camel is an AI booster. Feel free to point me to his research from the last 30 years.

nozzlegear an hour ago | parent | prev | next [-]

If I said what I think, dang would tell me to read the site's guidelines.

elisbce 2 hours ago | parent | prev | next [-]

He already scanned the codebase with Codex Security and a whole bunch of other AI tools, and fixed 200-300 bugs and CVEs. On top of that Mythos found 1 more bug and 1 more CVE is already impressive.

TacticalCoder 2 hours ago | parent | prev [-]

> I’m not sure how to reconcile anthropic’s update ...

Why not? TFA says 23 000 findings "of all severities" and then, in the end, only 88 security advisories published.

What we'd really need is how many security advisories not related to Mythos findings have been published in the same time. If it's, say, 500 security advisories (just making a number up), wouldn't Anthropic's update in TFA and Daniel Steinberg's comments reconcile?

Like, yup, we've got a new tool to find exploits. It's a tool. It's new. We already had tools. Let's make the software world a bit more secure.

Now if you tell me that 100 security advisories have been published in that timespan and that 88 were due to Anthropic's Mythos: now I'd have to say that it's hard to reconcile Daniel Steinberg's position with TFA.