Remix clone Hacker News

new | show | ask | jobs Github

	▲	localuser13 4 hours ago
		Is it? Gemini 3-pro-preview and 3-flash-preview, respectively top2 and top3, had 44% and 37% true positive and whooping 65% and 86% false positives. This is worse than a coin toss. Anything more than 0% (3% to be generous) is useless in the real world. This leaves only grok and GPT, with 18%, 9% and 2% success rate. In fact, this is what authors said themselves: "However, this approach is not ready for production. Even the best model, Claude Opus 4.6, found relatively obvious backdoors in small/mid-size binaries only 49% of the time. Worse yet, most models had a high false positive rate — flagging clean binaries." So I'm not sure if we're even discussing the same article. I also don't see a comparison with any other methodology. What is the success rate of ./decompile binary.exe \| grep "(exec\|system)/bin/sh"? What is the success rate of state-of-the-art alternative approaches?