Remix.run Logo
Roark66 6 hours ago

And this one demonstration why these "1000 CTOs claim no effectiveness improvement after introducing AI in their companies" are 100% BS.

They may have not noticed an improvement, but it doesn't mean there isn't any.

localuser13 4 hours ago | parent | next [-]

Is it? Gemini 3-pro-preview and 3-flash-preview, respectively top2 and top3, had 44% and 37% true positive and whooping 65% and 86% false positives. This is worse than a coin toss. Anything more than 0% (3% to be generous) is useless in the real world. This leaves only grok and GPT, with 18%, 9% and 2% success rate.

In fact, this is what authors said themselves: "However, this approach is not ready for production. Even the best model, Claude Opus 4.6, found relatively obvious backdoors in small/mid-size binaries only 49% of the time. Worse yet, most models had a high false positive rate — flagging clean binaries." So I'm not sure if we're even discussing the same article.

I also don't see a comparison with any other methodology. What is the success rate of ./decompile binary.exe | grep "(exec|system)/bin/sh"? What is the success rate of state-of-the-art alternative approaches?

snovv_crash 5 hours ago | parent | prev | next [-]

Even without AI, many (most?) orgs are held back by internal processes and politics, not development speed.

HeWhoLurksLate 6 hours ago | parent | prev [-]

it also generally takes a heck of a noisy bang for internal developments to make it to the c-suite