It's worth noting here that the author came up with a handful of good heuristics to guide Claude and a very specific goal, and the LLM did a good job given those constraints. Most seasoned reverse engineers I know have found similar wins with those in place.

What LLMs are (still?) not good at is one-shot reverse engineering for understanding by a non-expert. If that's your goal, don't blindly use an LLM. People already know that you getting an LLM to write prose or code is bad, but it's worth remembering that doing this for decompilation is even harder :)

▲

zdware 4 hours ago | parent | next [-]

Agree with this. I'm a software engineer that has mostly not had to manage memory for most of my career.

I asked Opus how hard it would be to port the script extender for Baldurs Gate 3 from Windows to the native Linux Build. It outlined that it would be very difficult for someone without reverse engineering experience, and correctly pointed out they are using different compilers, so it's not a simple mapping exercise. It's recommendation was not to try unless I was a Ghrida master and had lots of time in my hands.

▲

dimitri-vs 3 hours ago | parent [-]

FWIW most LLMs are pretty terrible at estimating complexity. If you've used Claude Code for any length of time you might be familiar with it's plan "timelines" which always span many days but for medium size projects get implemented in about an hour.

I've had CC build semi-complex Tauri, PyQT6, Rust and SvelteKit apps for me without me having ever touched that language. Is the code quality good? Probably not. But all those apps were local-only tools or had less than 10 users so it doesn't matter.

	▲	zdware 3 hours ago \| parent \| next [-]
		That's fair, I've had similar experiences working in other stacks with it. And with some niche stacks, it seems to struggle more. Definitely agree the more narrow the context/problem statement, higher chance of success. For this project, it described its reasoning well, and knowing my own skillset, and surface level info on how one would start this, it had many good points that made the project not realistic for me.
	▲	hobs an hour ago \| parent \| prev [-]
		Disagree - the timelines are completely reasonable for an actual software project, and that's what the training data is based on, not projects written with LLMs.

▲

ph4evers 5 hours ago | parent | prev [-]

Are they not performing well because they are trained to be more generic, or is the task too complex? It seems like a cheap problem to fine-tune.

	▲	pixl97 5 hours ago \| parent [-]
		Sounds like a more agentic pipeline task. Decompile, assess, explain.