Tokens can also be burnt on decompilation.

tptacek 11 hours ago | parent | next [-]

Yes, and it apparently burns lots of tokens. But what I've heard is that the outcomes are drastically less expensive than hand-reversing was, when you account for labor costs.

▲

jeffmcjunkin 10 hours ago | parent | next [-]

Can confirm. Matching decompilation in particular (where you match the compiler along with your guess at source, compile, then compare assembly, repeating if it doesn't match) is very token-intensive, but it's now very viable: https://news.ycombinator.com/item?id=46080498

Of course LLMs see a lot more source-assembly pairs than even skilled reverse engineers, so this makes sense. Any area where you can get unlimited training data is one we expect to see top-tier performance from LLMs.

(also, hi Thomas!)

▲

stackghost 9 hours ago | parent | next [-]

My own experience has been that "ghidra -> ask LLM to reason about ghidra decompilation" is very effective on all but the most highly obfuscated binaries.

Burning tokens by asking the LLM to compile, disassemble, compare assembly, recompile, repeat seems very wasteful and inefficient to me.

	▲	mikestaas 8 hours ago \| parent \| next [-]
		LaurieWired did a good episode about that kind of thing https://www.youtube.com/watch?v=u2vQapLAW88
	▲	kimixa 6 hours ago \| parent \| prev [-]
		That matches my experience too - LLMs are very capable in "translating" between domains - one of the best experience I've had with LLMs is turning "decompiled" source into "human readable" source. I don't think that "Binary Only" closed-source isn't the defense against this that some people here seem to think it is.

▲

echelon 6 hours ago | parent | prev [-]

Has anyone used an LLM to deobfuscate compiled Javascript?

	▲	saagarjha 21 minutes ago \| parent \| next [-]
		I've used it for hobby efforts on Electron/React Native (Hermes bytecode) apps and it seems to work reasonably well
	▲	bitexploder 6 hours ago \| parent \| prev [-]
		Yep. They are good at it.

▲

gfosco 7 hours ago | parent | prev [-]

Yeah, it's token intensive but worth it. I built a very dumb example harness which used IDA via MCP and analyzed/renamed/commented all ~67k functions in a binary, using Claude Haiku for about $150. A local model could've accomplished it for much less/free. The knowledge base it outputs and the marked up IDA db are super valuable.

	▲	whattheheckheck 6 hours ago \| parent [-]
		Do you have the repo example?

▲

somesortofthing 11 hours ago | parent | prev | next [-]

Another asymmetric advantage for defenders - attackers need to burn tokens to form incomplete, outdated, and partially wrong pictures of the codebase while the defender gets the whole latest version plus git history plus documentation plus organizational memory plus original authors' cooperation for free.

▲

echelon 6 hours ago | parent | prev [-]

> Tokens can also be burnt on decompilation.

Prediction 1. We're going to have cheap "write Photoshop and AutoCad in Rust as a new program / FOSS" soon. No desktop software will be safe. Everything will be cloned.

Prediction 2. We'll have a million Linux and Chrome and other FOSS variants with completely new codebases.

Prediction 3. People will trivially clone games, change their assets. Modding will have a renaissance like never before.

Prediction 4. To push back, everything will move to thin clients.