| ▲ | xiaoyu2006 7 hours ago | |
The quick test doesn't show a lot - by out straight asking if this is a security patch, it implies and guides AI to have output more probably to agree on this assumption. A confusion matrix is more useful. Nonetheless of course this is not a detailed ai capability testing blog. | ||
| ▲ | jefftk 6 hours ago | parent | next [-] | |
[author] I agree it is not much additional evidence! If someone wanted to try running the same test on a series of N commits from that list including this one I'd be very curious to see the answer! | ||
| ▲ | cubefox 6 hours ago | parent | prev [-] | |
Yeah, ideally we would need the phi coefficient (aka MCC, the binary Pearson correlation), which can be calculated from a confusion matrix of yes/no LLM classifications for all kernel diffs. (Number of true positives, true negatives, false positives, false negatives.) | ||