Remix.run Logo
jamesu 2 hours ago

There is no way you could recreate a convincing enough 90s era codebase of a japanese videogame + its associated tools + scripts and commented out codepaths with current ai tools.

sigmoid10 2 hours ago | parent | next [-]

I wouldn't be too sure about that. The original decompilations of Mario 64 and Ocarina of Time were done mostly by hand because LLMs weren't really around yet, but these kinds of projects seem perfectly suited for handing the gritty work off to AI: There is a clear output (exact binary recreation) and a straightforward path to get there (look at this assembly code and produce some C code from it). The decompilation of Twilight Princess jumped from very little to basically 100% of core code in the past year alone: https://github.com/zeldaret/tp

I have no doubt that this would be possible for MGS2 as well.

SpecialistK 2 hours ago | parent | next [-]

Keep your eyes open for Sonic R too. Sadly a lot of the online Sonic community has been toxic to the dev for being transparent about using Claude for the majority of the disassembly. Even though he's a very talented developer with lots of credit to his name, and only took a few weeks compared to a year+ if fully manual.

InvisibleUp an hour ago | parent | next [-]

Having followed his bsky during his announcement, he started off per-emptively dissing on his haters that... didn't even exist yet. Constantly posting memes about how everyone was dissing him and how AI was totally superior (and then posting his angry sessions with Claude when it got something wrong) when most other users were just "that's cool man". The thing that made him quit bsky was a (now-deleted) thread someone posted criticizing the weird crash-outs. I think he was more... normal about the whole thing, people would have received the project quite a bit more positively.

Grazester 2 hours ago | parent | prev [-]

[dead]

paavohtl an hour ago | parent | prev | next [-]

I don't think it's impossible, but it would take a lot of time and a lot of money; likely more time than good enough models have been commercially available.

I have been working on an incremental decompilation-based reimplementation (basically how OpenRCT2 was done) of Worms Armageddon for the past 2 months with a lot of help from LLM tools; primarily Claude Code and Ghidra MCP. I've worked on it almost every day, reaching Claude Code Max 5x's 5 hour session limit multiple times every day. Suffice to say as a software rendered, sprite-based 90s PC game, Worms Armageddon is several orders of magnitude simpler than MGS2. Despite that, I think it will be 2-3 more months of work before I can compile a fully independent version of the game.

This is despite the game being an almost ideal candidate for automated RE, as it uses deterministic game logic with built-in checksum checks in replays and multiplayer. I've downloaded all the speedruns I could find for the game (as replay files) and I've retrofitted the replay system into a massively parallel test framework, which simulates over 600 games in about 30 seconds. So Claude can port all game logic independently without much need for manual testing; the replay tests can almost guarantee perfect correctness.

MGS2 doesn't have anything like that, so every ported function requires extensive manual testing. Even with LLM tools, an accurate decomp could take years (unless you're willing spend thousands of $currency per month on it).

networked 22 minutes ago | parent [-]

This is really cool! Your process is compelling, and your choice of game is excellent. I'd like to read a long blog post about your entire journey from the beginning to a working binary once you get there.

For those wondering, there is a public Git repository at https://github.com/paavohuhtala/OpenWA.

paavohtl 15 minutes ago | parent [-]

As it happens I do have the habit of writing very long blog posts - though none on OpenWA so far. The OpenWA readme file serves as a bit of an introduction, though it's already a month old.

AshamedCaptain an hour ago | parent | prev | next [-]

Decompilation to C (and even C++!) has been done automatically for 2-3 decades at least. I am not sure what has changed in recent years other than people playing fast and loose with copyright (and GitHub allowing it, likely because their LLMs also stand to benefit). Introducing LLMs here is only going to introduce errors, delays and likely push you away from a reliable result.

The challenge here is readability. Reading the TP source leak you link I think it's even behind the current state of the art, as it's barely above assembly. This is where I suspect even the smallest of LLMs may help, since you don't care that much if it introduces errors.

jamesu 2 hours ago | parent | prev | next [-]

My take was more along the lines of: it wouldn't be convincing enough, if anything it would be too clean and perfect.

Andrex an hour ago | parent | prev [-]

Does the TP decomp use AI to achieve their speed?

CamperBob2 2 hours ago | parent | prev | next [-]

That's pre-2026 thinking. At this point, with the ability to lash IDA or similar tools to an agentic harness, there is no longer any such thing as a closed-source binary.

wuschel 2 hours ago | parent [-]

What is the state of the art of compilers here? What size of project are we speaking here?

What is the experience faulty decompilation, and the existence of bugs in the binary?

Could one decompile a binary to a more modern language than C?

diath 2 hours ago | parent | prev [-]

Absolutely. This is just some delusions of a vibe coder at best. Not with just current generation of AI tools but essentially never. The conversion from C, C++, Rust or whatever, through post-processing (macros etc), through IR generation, through compile time optimizations, through link time optimizations, to the generated machine code is a one way street for low level languages. You can get a pretty close higher level approximation that matches the flow/logic/structure - but the code will never be anywhere near close to the original source code. I could write the same C++ program in 3 different ways and get identical assembly, how do you go back to the exact source? The answer is that you don't.

Here's the same simple program, written in 3 different ways, producing identical binary compatible code: https://godbolt.org/z/qWrc8fEnn

How does the AI know whether it should produce back the snippet #1, #2 or #3? It does not. It cannot.

CamperBob2 36 minutes ago | parent [-]

Who cares? Who said anything about recreating the exact code? You will get usable, compilable, and surprisingly readable source code, in your language of choice, that yields the functional equivalent of the binary.

Barring obvious edge cases that could show up but don't usually, like intentional race conditions. Timing is the one area where things get iffy.

23 minutes ago | parent | next [-]
[deleted]
diath 21 minutes ago | parent | prev [-]

> Who said anything about recreating the exact code?

The person I'm replying to? Who said you will get the same code as if it were the original source?