Sounds like they need another agent to detect false positives (I joke, I joke)

You joke, but that's a very real approach that AI pentesting companies do take: an agent that creates reports, and an agent that 'validates' reports with 'fresh context' and a different system prompt that attempts to reproduce the vulnerability based on the report details.

*Edit: the paper seems to suggest they had a 'Triager' for vulnerability verification, and obviously that didn't catch all the false positives either, ha.

▲

tptacek 2 days ago | parent [-]

Can't be any worse than Fortify was!

▲

ludicity 2 days ago | parent [-]

At my first job, all the applications the data people developed were compulsorily evaluated through Fortify (I assume this is HP Fortify) and to this day I have no idea what the security team was actually doing with the product, or what the product does. All I know is that they never changed anything even though we were mostly fresh grads and were certainly shipping total garbage.

	▲	tptacek 2 days ago \| parent [-]
		It's like, when you say agents will largely be relegated to "triage" --- well, a pretty surprising amount of nuts and bolts infosec work is basically just triage!