Thanks for sharing. Its definitely more concrete. Some of the things that I was hoping to find were, the number of false positives, the times it takes to identify the false positives from real ones, the taxation on human mind to perform this exercise. Did anyone manually verified the exploits which were identified by the LLM or were they assumed correct based on the explanation. I do understand that the target audience of these articles is probably the decision makers so the language and content has to be tailored accordingly.

▲

pixl97 4 hours ago | parent [-]

>, the number of false positives,

Really this is why the LLM needs to be able to write exploits for issues it finds. Of course that leads down a rabbit hole of other issues. But if an exploit works, then that's pretty conclusive evidence.

	▲	lacewing 4 hours ago \| parent [-]
		For a subset of bugs, yes. For some others, not really: I've seen LLMs make bogus assumptions about the threat model (in which case, the exploit works but doesn't demonstrate anything useful) or "cheat" by modifying the code to demonstrate a hallucinated issue. Frontier models, including Mythos, can greatly streamline bug hunting and exploit developments in the hands of a competent security engineer. In the hands of a person with no security experience, they will still mostly waste your time and money.