Remix clone Hacker News

new | show | ask | jobs Github

	▲	xiaoyu2006 7 hours ago
		The quick test doesn't show a lot - by out straight asking if this is a security patch, it implies and guides AI to have output more probably to agree on this assumption. A confusion matrix is more useful. Nonetheless of course this is not a detailed ai capability testing blog.
	▲	jefftk 6 hours ago \| parent \| next [-]
		[author] I agree it is not much additional evidence! If someone wanted to try running the same test on a series of N commits from that list including this one I'd be very curious to see the answer!
	▲	cubefox 6 hours ago \| parent \| prev [-]
		Yeah, ideally we would need the phi coefficient (aka MCC, the binary Pearson correlation), which can be calculated from a confusion matrix of yes/no LLM classifications for all kernel diffs. (Number of true positives, true negatives, false positives, false negatives.)