Remix clone Hacker News

new | show | ask | jobs Github

	▲	lorenzoguerra 7 hours ago
		>We also ran the full dataset of 263 URLs (254 phishing, 9 confirmed legitimate) through Muninn's automatic scan. This is the scan that runs on every page you visit without any action on your part. On its own, the automatic scan correctly identified 238 of the 254 phishing sites and only incorrectly flagged 6 legitimate pages. ...so it has a false positive rate of 67%? On a ridiculously small dataset?
	▲	jdup7 6 hours ago \| parent [-]
		Fair point in isolation that number doesn't look good. The important context is that this dataset was built to test phishing detection, not to measure false positive rates on normal traffic. It's sourced from our threat intelligence tooling so it's almost entirely malicious URLs by design. The 9 clean sites aren't a random sample of everyday browsing. They're sites that were submitted as suspicious and turned out to be legitimate so they're basically the hardest possible set of clean pages to correctly classify. This seems like a common critique and we definitely could have done a better job of explaining the methodology. Going forward we will include numbers from daily use to give a better picture of FP rate.