What types of vulnerabilities was it finding? Cross site scripting, privilege escalation, etc? Mostly memory corruption or any Javascript logic bugs?

▲

IainIreland 5 hours ago | parent | next [-]

I work on SpiderMonkey, so I mostly looked at the JS bugs. It was a smorgasbord of various things. Broadly speaking I'd say the most impressive bugs were TOCTOU issues, where we checked something and later acted on it, and the testcase found a clever way to invalidate the result of the check in between.

If you look closely at, say, this patch, you might get a sense of what I mean (although the real cleverness is in the testcase, which we have not made public): https://hg-edge.mozilla.org/integration/autoland/rev/c29515d...

▲

reisse 5 hours ago | parent | next [-]

> although the real cleverness is in the testcase, which we have not made public

What is the point of keeping it private? I'd bet feeding this patch to Opus and asking to look for specific TOCTOU issue fixed by the patch will make it come up with a testcase sooner or later.

	▲	IainIreland 5 hours ago \| parent \| next [-]
		The same is also true of a good security researcher, and has been for a long time. The question is mostly whether it takes long enough to come up with a testcase that we've managed to ship the fix to all affected releases, and given people some time to update. (And maybe LLMs do change the calculus there! We'll have to wait and see.)
	▲	mccr8 5 hours ago \| parent \| prev [-]
		Possibly! One of the many areas that might need rethinking in the age of AI (that started in February of this year) is how long security bugs should be hidden. We live in interesting times.

▲

paulvnickerson 5 hours ago | parent | prev [-]

Very cool, thank you.

▲

mccr8 5 hours ago | parent | prev [-]

I'd say it leans towards memory corruption kinds of issues, as those are easiest to pass the validator, thanks to AddressSanitizer. I think there's a lot of potential for making the validator more sophisticated. Like maybe you add a JS function that will only crash when run in the parent process and have a validator that checks for that specific crash, as a way for the LLM to "prove" that it managed to run arbitrary JS in the parent. Would that turn up subtler issues? Maybe.