I'm interested in the implications for the open source movement, specifically about security concerns. Anyone know is there has been a study about how well Claude Code works on closed source (but decompiled) source?

▲

skeledrew 14 hours ago | parent | next [-]

> Claude Code works on closed source (but decompiled) source

Very likely not nearly as well, unless there are many open source libraries in use and/or the language+patterns used are extremely popular. The really huge win for something like the Linux kernel and other popular OSS is that the source appears in the training data, a lot. And many versions. So providing the source again and saying "find X" is primarily bringing into focus things it's already seen during training, with little novelty beyond the updates that happened after knowledge cutoff.

Giving it a closed source project containing a lot of novel code means it only has the language and it's "intuition" to work from, which is a far greater ask.

▲

kasey_junk 14 hours ago | parent [-]

I’m not a security researcher, but I know a few and I think universally they’d disagree with this take.

The llms know about every previous disclosed security vulnerability class and can use that to pattern match. And they can do it against compiled and in some cases obfuscated code as easily as source.

I think the security engineers out there are terrified that the balance of power has shifted too far to the finding of closed source vulnerabilities because getting patches deployed will still take so long. Not that the llms are in some way hampered by novel code bases.

▲

zahlman 5 hours ago | parent | next [-]

> The llms know about every previous disclosed security vulnerability class and can use that to pattern match

Do the reports include patterns that could be matched against decompiled code, though? As easily as they would against proper source? I find it a bit hard to believe.

▲

skeledrew 13 hours ago | parent | prev [-]

Many vulnerabilities aren't just pattern matching though; deep understanding of the context in the particular codebase is also needed. And a novel codebase means more attention than usual will be spent grepping and keeping the context in focus. Which will make it easier to miss certain things, than if enough of the context was already encoded in the model weights.

Same thing applies to humans: the better someone knows a codebase, the better they will be at resolving issues, etc.

	▲	tptacek 10 hours ago \| parent [-]
		Almost all vulnerabilities are either direct applications of known patterns, incremental extensions of them, or chains of multiple such steps.

▲

zahlman 5 hours ago | parent | prev | next [-]

Definitely not my wheelhouse, but I would expect it to be considerably worse.

Simply because the source code contains names that were intended to communicate meaning in a way that the LLM is specifically trained to understand (i.e., by choosing identifier names from human natural language, choosing those names to scan well when interspersed into the programming language grammar, including comments etc.). At least if debugging information has been scrubbed, anyway (but the comments definitely are). Ghidra et. al. can only do so much to provide the kind of semantic content that an LLM is looking for.

	▲	tverbeure 4 hours ago \| parent [-]
		I've cut-and-pasted some assembly code into the free version of ChatGPT to reverse engineer some old binaries and its ability to find meaning was just scary.

▲

steveklabnik 7 hours ago | parent | prev | next [-]

I’ve had Claude Code diagnose bugs in a compiler we wrote together by using gdb and objdump to examine binaries it produces. We don’t have DWARF support yet so it is just examining the binary. That’s not security work, but it’s adjacent to the sorts of skills you’re talking about. The binaries are way smaller than real programs, though.

▲

dolmen 5 hours ago | parent | prev [-]

It would be much more interesting/efficient if the LLM had tokens for machine instructions so extracting instructions would be done at tokenizing phase, not by calling objdump.

But I guess I'm not the first one to have that idea. Any references to research papers would be welcome.

	▲	tverbeure 4 hours ago \| parent [-]
		As an experiment, I just now took a random section of a few hundreds bytes (as a hexdump) from the /bin/ls executable and pasted them into ChatGPT. I don't know if it's correct, but it speculated that it's part of a command line processor: https://chatgpt.com/share/69d19e4f-ff2c-83e8-bc55-3f7f5207c3... Now imagine how much more it could have derived if I had given it the full executable, with all the strings, pointers to those strings and whatnot. I've done some minor reverse engineering of old test equipment binaries in the past and LLMs are incredible at figuring out what the code is doing, way better than the regular way of Ghidra to decompile code.