Remix.run Logo
We Found Zero Low-Severity Bugs in 165 AI Code Reports. Zero(shamans.dev)
15 points by dmonroy 2 days ago | 14 comments
lpapez 2 days ago | parent | next [-]

What is the overall severity distribution, including human code?

Based on the churn I have fixing security vulnerabilities reported by Snyk and Trivy, I have a feeling that issues have a tendency to be labeled mostly as HIGH or CRITICAL when they are assigned a CVE, for better or worse.

dmonroy 2 days ago | parent | next [-]

You're absolutely right about CVE inflation. I deal with the same Snyk/Trivy noise daily where a prototype pollution in some deep dependency gets marked CRITICAL.

Our distribution (71% High, 18% Critical) is definitely skewed compared to normal CVEs. Part of this is selection bias: nobody reports when AI generates boring secure code. But even accounting for that, the pattern is real: AI seems to either nail security or fail spectacularly. Very few "medium" mistakes.

The key difference from your Snyk alerts: these aren't dependency updates or theoretical vulnerabilities. They're actual logic flaws:

- Missing auth checks - SQL injections - hardcoded secrets

You know, The stuff that makes you go "how did this pass code review?"

This is ongoing research, and hopefully we'll be in a position to elaborate better conclusions soon.

DeepYogurt a day ago | parent | prev [-]

Highs and Critical are together more than 50%

https://nvd.nist.gov/general/nvd-dashboard

eqvinox a day ago | parent | prev | next [-]

I neither understand where the HN title line is coming from, nor what this report is trying to tell me. AI is introducing high severity bugs rather than low severity ones? That's… bad? Is this based on actual reports, or it's own analysis? Actual reports will have survivorship bias since higher severities are reported more actively and quicker…

Anyway, I see numbers but no message.

TrinaryWorksToo 2 days ago | parent | prev | next [-]

How do we know this isn't Survivorship Bias? Perhaps there aren't any low-severity bugs because they're all high severity?

dmonroy a day ago | parent | next [-]

That's absolutely a factor here. We are missing the stuff that no one is talking about: "AI generated inefficient loop" or "AI forgot to close file handle". The documented cases were documented precisely because they were worthy.

That said, even with survivorship bias, there's a pattern.

When humans write bad code, we see the full spectrum, form typos to total meltdowns. With AI, the failures cluster around specific security fundamentals:

- Input validation - Auth checks - Rate limiting

I've seen no AI typo, have you?

Does it mean AI learned to code from tutorials that skip the boring security chapters?... think about it.

So yes, we are definitely seeing survivor bias in severity reporting. But the "types" of survivors tell us something important about what AI consistently misses. The low-severity bugs probably exist, but perhaps not making headlines.

The real question: if this is just the visible part of the iceberg, what's underneath?

hinkley 10 hours ago | parent | prev | next [-]

The fact that they don't mention them makes them the most likely case.

"Did you hit your wife?"

"I haven't murdered anybody."

"Murder?? Nobody mentioned murder, Mr Fieldman."

dfcheng a day ago | parent | prev [-]

This is what I’ve experienced having LLMs code: ensuring security is not an adequate part of its training. Of course, modern developers I work with don’t give a shit either.

dmonroy 19 hours ago | parent [-]

That last part is, well, current reality.

The difference is you can at least shame your colleagues into caring about security and coding standards during code review. With AI, it's like it learned from every tutorial that said "we'll skip input validation to keep this example simple" and took that as strict rule.

weare138 2 days ago | parent | prev [-]

This is an ongoing longitudinal study with inherent reporting biases and coverage limitations.

Well at least they're honest...

dmonroy a day ago | parent [-]

You caught us!... and turns out "we don't have all the data" isn't exactly the pitch VCs want to hear

Jokes apart, I'd rather admit we are working with incomplete data than pretend otherwise. We are probably seeing 5-10% of what's actually happening out there. Most AI code bugs die quietly in projects that never see production. And it is perhaps better that way.

[not]Fun fact: A colleague just told me how a rogue claude agent ran `rm -rf ~/` in a background process earlier today. It might become #166 in our report.

hinkley 10 hours ago | parent | next [-]

Generally when you have incomplete data, it pays not to double down on your findings in the title.

Makes you look guilty. Which perhaps you are.

weare138 a day ago | parent | prev [-]

Well I don't deal with VCs but from a technical perspective that's is an odd way to phrase it. The perfectly valid explanation in your response is what people the tech scene would expect but if this is a VC money grab then I guess you know your intended audience.

dmonroy 18 hours ago | parent [-]

Turns out I'm not as good as joking as I think I am. The rest of the response, btw, that was legit.