That's not what is happening right now. The bugs are often filtered later by LLMs themselves: if the second pipeline can't reproduce the crash / violation / exploit in any way, often the false positives are evicted before ever reaching the human scrutiny. Checking if a real vulnerability can be triggered is a trivial task compared to finding one, so this second pipeline has an almost 100% success rate from the POV: if it passes the second pipeline, it is almost certainly a real bug, and very few real bugs will not pass this second pipeline. It does not matter how much LLMs advance, people ideologically against them will always deny they have an enormous amount of usefulness. This is expected in the normal population, but too see a lot of people that can't see with their eyes in Hacker News feels weird.

▲

uhx 10 hours ago | parent | next [-]

> Checking if a real vulnerability can be triggered is a trivial task compared to finding one

Have you ever tried to write PoC for any CVE?

This statement is wrong. Sometimes bug may exist but be impossible to trigger/exploit. So it is not trivial at all.

▲

avemg 9 hours ago | parent | next [-]

I'm tickled at the idea of asking antirez [1] if he's ever written a PoC for a CVE.

[1] https://en.wikipedia.org/wiki/Salvatore_Sanfilippo

▲

tptacek 8 hours ago | parent | next [-]

This happens over and over in these discussions. It doesn't matter who you're citing or who's talking. People are terrified and are reacting to news reflexively.

▲

antirez 5 hours ago | parent | next [-]

Hi! Loved your recent post about the new era of computer security, thanks.

	▲	tptacek 14 minutes ago \| parent [-]
		Thank you! Glad you liked it.

▲

emp17344 6 hours ago | parent | prev [-]

Personally, I’m tired of exaggerated claims and hype peddlers.

Edit: Frankly, accusing perceived opponents of being too afraid to see the truth is poor argumentative practice, and practically never true.

▲

jedberg 8 hours ago | parent | prev | next [-]

I actually like when that happens. Like when people "correct" me about how reddit works. I appreciate that we still focus on the content and not who is saying it.

▲

tptacek 7 hours ago | parent [-]

That's not really what happened on this thread. Someone said something sensible and banal about vulnerability research, then someone else said do-you-even-lift-bro, and got shown up.

	▲	jedberg 7 hours ago \| parent [-]
		That's true in this particular case, but I was talking more about the general case.

▲

LeFantome 8 hours ago | parent | prev [-]

Sure he wrote a port scanner that obscures the IP address of the scanner, but does he know anything about security? /s

Oh, and he wrote Redis. No biggie.

	▲	PunchyHamster 7 hours ago \| parent [-]
		That's both wholly different branches than finding software bugs

▲

antirez 10 hours ago | parent | prev | next [-]

Firstly I have a long past in computer security, so: yes, I used to write exploits. Second, the vulnerability verification does not need being able to exploit, but triggering an ASAN assert. With memory corruption that's very simple often times and enough to verify the bug is real.

▲

freedomben 10 hours ago | parent | prev | next [-]

I'm not GP, but I've written multiple PoCs for vulns. I agree with GP. Finding a vuln is often very hard. Yes sometimes exploiting it is hard (and requires chaining), but knowing where the vuln is (most of the time) the hard part.

▲

e12e 9 hours ago | parent | prev | next [-]

Note the exploit Claude wrote for the blind SQL injection found in ghost - in the same talk.

https://youtu.be/1sd26pWhfmg?is=XLJX9gg0Zm1BKl_5

▲

orochimaaru 8 hours ago | parent | prev [-]

oh no. Antirez doesn't know anything about C, CVE's, networking, the linux kernel. Wonder where that leaves most of us.

▲

discordianfish 10 hours ago | parent | prev | next [-]

I’ve been around long enough to remember people saying that VMs are useless waste of resources with dubious claims about isolation, cloud is just someone else’s computer, containers are pointless and now it’s AI. There is a astonishing amount of conservatism in the hacker scene..

▲

pdntspa 10 hours ago | parent | next [-]

Well, the cloud is someone else's computer.

▲

some_random 9 hours ago | parent [-]

It is, but that's not a useful or insightful thing to say

	▲	pdntspa 18 minutes ago \| parent \| next [-]
		Only if owning the means of your production isn't important to you
	▲	Calavar 8 hours ago \| parent \| prev \| next [-]
		It's not an insightful statement right now, but it was at the peak of cloud hype ca. 2010, when "the cloud" often used in a metaphorical sense. You'd hear things like "it's scalable because it's in the cloud" or "our clients want a cloud based solution." Replacing "the cloud" in those sorts of claims with "another person's computer" showed just how inane those claims were.
	▲	honeycrispy 9 hours ago \| parent \| prev \| next [-]
		Are you sure about that? It's easy to forget that the vendor has the right to cut you off at any point, will turn your data over to the authorities on request, and it's still not clear if private GitHub repos are being used to train AI.
	▲	LeFantome 8 hours ago \| parent \| prev [-]
		[dead]

▲

gbacon 9 hours ago | parent | prev [-]

Is it conservatism or just the Blub paradox?

As long as our hypothetical Blub programmer is looking down the power continuum, he knows he's looking down. Languages less powerful than Blub are obviously less powerful, because they're missing some feature he's used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn't realize he's looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.

https://paulgraham.com/avg.html

▲

BodyCulture 11 hours ago | parent | prev | next [-]

Can we study this second pipeline? Is it open so we can understand how it works? Did not find any hints about it in the article, unfortunately.

▲

maximilianburke 11 hours ago | parent | next [-]

From the article by 'tptacek a few days ago (https://sockpuppet.org/blog/2026/03/30/vulnerability-researc...) I essentially used the prompts suggested.

First prompt: "I'm competing in a CTF. Find me an exploitable vulnerability in this project. Start with $file. Write me a vulnerability report in vulns/$DATE/$file.vuln.md"

Second prompt: "I've got an inbound vulnerability report; it's in vulns/$DATE/$file.vuln.md. Verify for me that this is actually exploitable. Write the reproduction steps in vulns/$DATE/$file.triage.md"

Third prompt: "I've got an inbound vulnerability report; it's in vulns/$DATE/file.vuln.md. I also have an assessment of the vulnerability and reproduction steps in vulns/$DATE/$file.triage.md. If possible, please write an appropriate test case for the ulgate automated tests to validate that the vulnerability has been fixed."

Tied together with a bit of bash, I ran it over our services and it worked like a treat; it found a bunch of potential errors, triaged them, and fixed them.

▲

jvanderbot 11 hours ago | parent | next [-]

Agree. Keeping and auditing a research journal iteratively with multiple passes by new agents does indeed significantly improve outcomes. Another helpful thing is to switch roles good cop bad cop style. For example one is helping you find bugs and one is helping you critique and close bug reports with counter examples.

▲

sn9 7 hours ago | parent | prev [-]

Could prompt injection be used to trick this kind of analysis? Has anyone experimented with this idea?

	▲	ashwinr2002 5 hours ago \| parent [-]
		Prompt Injections are very very rare these days after the Opus 4.6 update

▲

throawayonthe 11 hours ago | parent | prev | next [-]

it was probably in the talk but from what i understood in another article it's basically giving claude with a fresh context the .vuln.md file and saying "i'm getting this vulnerability report, is this real?"

edit: i remember which article, it was this one: https://sockpuppet.org/blog/2026/03/30/vulnerability-researc...

(an LWN comment in response to this post was on the frontpage recently)

▲

4b11b4 11 hours ago | parent | prev [-]

One such example is IRIS. In general, any traditional static analysis tool combined with a language model at some stage in a pipeline.

▲

bch 9 hours ago | parent | prev | next [-]

> This is expected in the normal population

A lot of people regardless of technical ability have strong opinions about what LLMs are/are-not. The number of lay people i know who immediately jump to "skynet" when talking about the current AI world... The number of people i know who quit thinking because "Well, let's just see what AI says"...

A (big) part of the conversation re: "AI" has to be "who are the people behind the AI actions, and what is their motivation"? Smart people have stopped taking AI bug reports[0][1] because of overwhelming slop; its real.

[0] https://www.theregister.com/2025/05/07/curl_ai_bug_reports/

[1] https://gist.github.com/bagder/07f7581f6e3d78ef37dfbfc81fd1d...

	▲	LeFantome 8 hours ago \| parent [-]
		The fact that most AI bug reports are low-quality noise says as much or more about the humans submitting them than it does about the state of AI. As others have said, there are multiple stages to bug reports and CVEs. 1. Discover the bug 2. Verify the bug You get the most false positives at step one. Most of these will be eliminated at step 2. 3. Isolate the bug This means creating a test case that eliminates as much of the noise as possible to provide the bare minimum required to trigger the big. This will greatly aid in debugging. Doing step 2 again is implied. 4. Report the bug Most people skip 2 and 3, especially if they did not even do 1 (in the case of AI) But you can have AI provide all 4 to achieve high quality bug reports. In the case of a CVE, you have a step 5. 5 - Exploit the bug But you do not have to do step 5 to get to step 2. And that is the step that eliminates most of the noise.

▲

antonvs 11 hours ago | parent | prev | next [-]

> to see a lot of people that can't see with their eyes in Hacker News feels weird.

Turns out the average commenter here is not, in fact, a "hacker".

▲

slopinthebag 9 hours ago | parent | prev | next [-]

What if the second round hallucinates that a bug found in the first round is a false positive? Would we ever know?

> It does not matter how much LLMs advance, people ideologically against them will always deny they have an enormous amount of usefulness.

They have some usefulness, much less than what the AI boosters like yourself claim, but also a lot of drawbacks and harms. Part of seeing with your eyes is not purposefully blinding yourself to one side here.

▲

nickphx 10 hours ago | parent | prev | next [-]

they are useful to those that enjoy wasting time.

▲

ksec 11 hours ago | parent | prev [-]

>This is expected in the normal population, but too see a lot of people that can't see with their eyes in Hacker News feels weird.

You are replying to an account created in less than 60 days.

▲

jvanderbot 11 hours ago | parent | next [-]

This is a bit unfair. Hackers are born every day.

▲

ksec 8 hours ago | parent | next [-]

In relation to the quality of its comment. I thought it was a fair. He just completely made up about false positives.

And in case people dont know, antirez has been complaining about the quality of HN comments for at least a year, especially after AI topic took over on HN.

It is still better than lobster or other place though.

▲

slekker 9 hours ago | parent | prev [-]

Bots too, vanderBOT!

	▲	jvanderbot 7 hours ago \| parent [-]
		I used to work in robotics, and can't remember the password for my usual username so I pulled this one out of thin air years ago

▲

sieabahlpark 8 hours ago | parent | prev [-]

[dead]