I don't understand the metric they're using. Which is maybe to be expected of an article that looks LLM-written. But they started with ~250 URLs; that's a weirdly small sample. I'm sure there are tens of thousands malicious websites cropping up monthly. And I bet that Safe Browsing flags more than 16% of that?

So how did they narrow it down to that small number? Why these sites specifically?... what's the false positive / negative rate of both approaches? What's even going on?

▲

john_strinlai 7 hours ago | parent | next [-]

>what's the false positive / negative rate of both approaches

the false positive rate is 100%. they just say everything is phishing:

"When we ran the full dataset through the deep scan, it caught every single confirmed phishing site with zero false negatives. The tradeoff is that it flagged all 9 of the legitimate sites in our dataset as suspicious, which is worth it when you're actively investigating a link you don't trust."

	▲	lorenzoguerra 6 hours ago \| parent [-]
		it's 100% for what they call "deep scan", it's 66.7% for the "automatic scan". Practically unusable anyway

▲

jdup7 7 hours ago | parent | prev [-]

Probably could have been a bit more descriptive around the dataset. Our tooling pulls in a lot more than 250 URLs but since we are manually confirming them that means a smaller dataset. In other words, out of the urls we pulled in these 250 were confirmed (by a human) as phishing. We did not do any selection beyond that. As for the article LLMs were used to help with the graphs and grammatical checks but that's it. This was our first month of going through this exercise and we definitely want to have larger datasets going forward as we expand capacity for review.

As for Safe Browsing catching more than 16% it depends on the timeline at the time these attacks are launched it's likely Safe Browsing catches closer to 0% but as the time goes on that number definitely climbs.