Remix.run Logo
tptacek a day ago

We're talking about different things here. A pentesting agent directly tests running systems. It's a (much) smarter version of Burp Scanner. It's going to find memory disclosure vulnerabilities the same way pentesters do, by stimulus/response testing. You can do code/test fusion to guide stimulus/response, which will make them more efficient, but the limiting factor here isn't whether transformers lose symbolic inference.

Remember, the competition here is against human penetration testers. Humans are extremely lossy testing agents!

If the threshold you're setting is "LLMs can eradicate memory disclosure bugs by statically analyzing codebases to the point of excluding those vulnerabilities as valid propositions", no, of course that isn't going to happen. But nothing on the table today can do that either! That's not the right metric.

cookiengineer a day ago | parent [-]

> Humans are extremely lossy testing agents!

Ha, I laughed at that one. I suppose you're right :D