| ▲ | NitpickLawyer 3 hours ago | |
This is a trendy article, rehashing themes that were prevalent over the last year, and, like those themes, will age like milk. If you look at the past 3 years and plot capabilities in 3 key areas, the conclusions will be vastly different. Code completion was "awww, how cute, this almost looks like python" in early 2023. It's now at the level of "oh my, this actually looks decent". Then there's e2e "agentic" stuff, where you needed tons of glue 2 years ago to have a decent workflow working 50% of the time. Now you have agents taking a spec, working for 2h uninterrupted, and delivering working, tested, linted code. Unattended. Lastly, these capabilities have led to CTF challenges going from 0 - 80% since RL was used to train these things. The first one was ~2y ago when a popular CTF site saw the first <10s capture on a new task. Now, several companies are selling CTF as a service, with more and more competitions being dominated by said agents. So yeah, rehashing all the old "arguments" is a futile attempt. This thing is getting better and better. RL does something really interesting, unlocking an interesting fixation with task completion. Give it a verifiable reward (i.e. capture a flag), and it will bang its head against the wall until it gets that flag. And what's more important, in security stuff you don't need perfect accuracy, nor maj@n. What you're looking for is pass@n, which usually gives 20-30% more on any benchmark. So, yeah, all your flags are belong to AI. ---- AI will compromise your cybersecurity posture, but that's because our postures have been bad all along. It will find more and more exploits, and the value in red-blue teams will be much more than the "bugs" and "exploits" LLM-assisted coding will "bring". Those will get automatically caught as well. But there's vastly more grass-fed guaranteed human-wrote good old fashion bugs out there. | ||
| ▲ | rainonmoon 2 hours ago | parent [-] | |
Some citations would help your case a lot. | ||