| ▲ | tptacek a day ago | ||||||||||||||||
Wait, I don't understand why Heartbleed is at all hard for an agent loop to uncover. There's a pattern for these attacks (we found one in nginx in the ordinary course of a web app pentest at Matasano --- and we didn't find it based on code, though I don't concede that an LLM would have a hard time uncovering these kinds of issues in code either). I think people are coming to this with the idea that a pentesting agent is pulling all its knowledge of vulnerabilities and testing patterns out of its model weights. No. The whole idea of a pentesting agent is that the agent code --- human-mediated code that governs the LLM --- encodes a large amount of knowledge about how attacks work. | |||||||||||||||||
| ▲ | cookiengineer a day ago | parent [-] | ||||||||||||||||
I think I'd differ between source code audits (where LLMs already are pretty good at spotting bugs if you can convince them to) and exploit development here. The former is automated by a large part already with fuzz testing of all kinds, so you wouldn't need an LLM if you knew what you were doing and have a TDD workflow or similar that checks against memleaks (say, with valgrind or similar approaches). The latter part is what I was referring to where I had hope initially that DNCs could help with that, and what I'd say that right now LLMs cannot discover this, only repeat and translate it (e.g. similar vulnerabilities in the past discovered by humans in another programming language). I'm talking specifically about discovery here because transformers lose symbolic inference, and that's why you can't use them for exploit generation. At least I wasn't able to make them work for the DARPA challenges, and had to use an AlphaGo based model combined with a CPPN and some techniques that worked in ES/HyperNEAT. I suppose what I'm trying to say is that there's a missing understanding of memory and time when it comes to LLMs. And that is usually manually encoded/governed how you put it by humans. And I would not count that as an LLM doing it, because you could have just automated the tool use without an LLM and get identical results. (When thinking e.g. about an MCP for kernel memory maps or say, valgrind or AFL etc) | |||||||||||||||||
| |||||||||||||||||