Remix.run Logo
rwmj 7 hours ago

With the exploits, you can try them and they either work or they don't. An attacker is not especially interested in analysing why the successful ones work.

With the CVE reports some poor maintainer has to go through and triage them, which is far more work, and very asymmetrical because the reporters can generate their spam reports in volume while each one requires detailed analysis.

SchemaLoad 7 hours ago | parent | next [-]

There's been several notable posts where maintainers found there was no bug at all, or the example code did not even call code from their project and had just found running a python script can do things on your computer. Entirely AI generated Issue reports and examples wasting maintainer time.

simonw 7 hours ago | parent | next [-]

My hunch is that the dumbasses submitting those reports were't actually using coding agent harnesses at all - they were pasting blocks of code into ChatGPT or other non-agent-harness tools and asking for vulnerabilities and reporting what came back.

An "agent harness" here is software that directly writes and executes code to test that it works. A vulnerability reported by such an agent harness with included proof-of-concept code that has been demonstrated to work is a different thing from an "exploit" that was reported by having a long context model spit out a bunch of random ideas based purely on reading the code.

I'm confident you can still find dumbasses who can mess up at using coding agent harnesses and create invalid, time wasting bug reports. Dumbasses are gonna dumbass.

wat10000 4 hours ago | parent | prev [-]

I've had multiple reports with elaborate proofs of concept that boil down to things like calling dlopen() on a path to a malicious library and saying dlopen has a security vulnerability.

0xDEAFBEAD an hour ago | parent | prev | next [-]

It can't be too long before Claude Code is capable of replication + triage + suggested fixes...

0xDEAFBEAD 27 minutes ago | parent | next [-]

BTW regarding "suggested fixes", an interesting attack would be to report a bug along with a prompt injection which will cause Claude to suggest inserting a vulnerability in the codebase in question. So, it's important to review bug-report-originated Claude suggestions extra carefully. (And watch for prompt injection attacks.)

Another thought is the reproducible builds become more valuable than ever, because it actually becomes feasible for lots and lots of devs to scan the entire codebase for vulns using an LLM and then verify reproducibility.

ares623 40 minutes ago | parent | prev [-]

Would you ever blindly trust it?

0xDEAFBEAD 36 minutes ago | parent [-]

No. I would probably do something like: Have Claude Code replicate + triage everything. If a report gets triaged as "won't fix", send an email to the reporter explaining what Claude found and why it was marked as "won't fix". Tell the reporter they still have a chance at the bounty if they think Claude made a mistake, but they have to pay a $10 review fee to have a human take a look. (Or a $1 LLM token fee for Claude to take another look, in case of simple confabulation.)

Note I haven't actually tried Claude Code (not coding due to chronic illness), so I'm mostly extrapolating based on HN discussion etc.

airza 7 hours ago | parent | prev [-]

All the attackers I’ve known are extremely, pathologically interested in understanding why their exploits work.

pixl97 5 hours ago | parent [-]

Very often they need to understand it well to chain exploits