Remix.run Logo
_alternator_ an hour ago

The article focuses on OSS, but closed-source software is at major risk too. Perhaps more.

It's gotten much easier to reverse engineer binaries in general, and security patches in particular. Basically, an LLM can turn binaries into 'readable' code, and then reason about said code.

salsakran an hour ago | parent | next [-]

Perhaps -- but I think for most people, the vast majority of proprietary software they consume is over the network.

But yeah, if you're distributing binaries publicly, then you're going to have very similar problems.

redanddead an hour ago | parent [-]

That happens a lot though, even OpenAI is attempting to lock functionality (like computer-use, 2 weeks ago) behind a binary -- Mac only they said, no EU. I saw a guy crack it the same day, ported to Windows. There are many many things like Rive that use binaries, obfuscation and uglification has been the name of the proprietary game for a long ass time, with the only protection being an assumption that "nobody would go through that trouble", yeah an LLM would ralph loop through it all day long, and make what you paid good money for pretty much free for anyone to use whenever they feel like it, we're back to the the "you wouldn't download a car would you?" argument

twism an hour ago | parent | prev | next [-]

Does it even need to turn it into readable code?

_alternator_ 5 minutes ago | parent [-]

My understanding is that decompilation into more readable code is an important step in building the path to an exploit.

This understanding may be incomplete or outdated (things moving very fast right now). I'd love to hear from a someone with more experience using LLMs to do binary analysis about the level of 'binary annotation' needed for LLMs relative to humans.

edrobap an hour ago | parent | prev [-]

I had done a fair bit of reverse-engineering-jar-files in the pre-LLM era for various reasons. The biggest problem with decompiled java files was naming. The original variable names, class names etc were not retained and the decompiler would use some alphanumeric series. That'd make reading code very hard. Curious how the current LLMs are able to address this. Maybe it's able to figure out how the class, variable etc is used and name it accordingly. (All this is assuming the original code itself was readable because there are enough bad programmers)

roenxi an hour ago | parent [-]

I expect Java would be easy-mode for the AI, they already do quite well reconstructing C++ from ghidra output in my experience from when I wanted to know what damage formula some game was using.

As a reminder; your account has been shadow-banned, it looks like you got a little unlucky in 2016.