Remix.run Logo
jcalvinowens 5 hours ago

How much of this result is effectively plagiarized open source compiler code? I don't understand how this is compelling at all: obviously it can regurgitate things that are nearly identical in capability to already existing code it was explicitly trained on...

It's very telling how all these examples are all "look, we made it recreate a shitter version of a thing that already exists in the training set".

Philpax 4 hours ago | parent | next [-]

What Rust-based compiler is it plagiarising from?

rubymamis 4 hours ago | parent | next [-]

There are many, here's a simple Google search:

https://github.com/jyn514/saltwater

https://github.com/ClementTsang/rustcc

https://github.com/maekawatoshiki/rucc

jsnell 4 hours ago | parent | next [-]

Did you actually look at these?

> https://github.com/jyn514/saltwater

This is just a frontend. It uses Cranelift as the backend. It's missing some fairly basic language features like bitfields and variadic functions. And if I'm reading the documentation right, it requires all the source code to be in a single file...

> https://github.com/ClementTsang/rustcc

This will compile basically no real-world code. The only supported data type is "int".

> https://github.com/maekawatoshiki/rucc

This is just a frontend. It uses LLVM as the backend.

luke5441 8 minutes ago | parent | prev | next [-]

Another one:

https://github.com/rustcoreutils/posixutils-rs/tree/main/cc

Philpax 5 minutes ago | parent [-]

Can't compile the Linux kernel, and ironically, also partly written by Claude.

Philpax 4 hours ago | parent | prev | next [-]

Look at what those compilers are capable of compiling and to which targets, and compare it to what this compiler can do. Those are wonderful, and I have nothing but respect for them, but they aren't going to be compiling the Linux kernel.

rubymamis 4 hours ago | parent [-]

I just did a quick Google search only on GitHub, maybe there are better ones out there on the internet?

chilipepperhott 3 hours ago | parent | prev [-]

I found this one too: https://github.com/PhilippRados/wrecc

lossolo 4 hours ago | parent | prev | next [-]

Language doesn't really matter, it's not how things are mapped in the latent space. It only needs to know how to do it in one language.

HDThoreaun 2 hours ago | parent [-]

Ok you can say this about literally any compiler though. The authors of every compiler have intimate knowledge of other compilers, how is this different?

jcalvinowens 4 hours ago | parent | prev [-]

Being written in rust is meaningless IMHO. There is absolutely zero inherent value to something being written in rust. Sometimes it's the right tool for the job, sometimes it isn't.

modeless 4 hours ago | parent | next [-]

It means that it's not directly copying existing C compiler code which is overwhelmingly not written in Rust. Even if your argument is that it is plagiarizing C code and doing a direct translation to Rust, that's a pretty interesting capability for it to have.

seba_dos1 2 hours ago | parent | next [-]

Translating things between languages is probably one of the least interesting capabilities of LLMs - it's the one thing that they're pretty much meant to do well by design.

jcalvinowens 4 hours ago | parent | prev [-]

Surely you agree that directly copying existing code into a different language is still plagiarism?

I completely agree that "reweite this existing codebase into a new language" could be a very powerful tool. But the article is making much bolder claims. And the result was more limited in capability, so you can't even really claim they've achieved the rewrite skill yet.

Philpax 4 hours ago | parent | prev | next [-]

Please don't open a bridge to the Rust flamewar from the AI flamewar :-)

jcalvinowens 4 hours ago | parent [-]

Hahaha, fair enough, but I refuse to be shy about having this opinion :)

4 hours ago | parent | prev [-]
[deleted]
jeroenhd 4 hours ago | parent | prev | next [-]

The fact it couldn't actually stick to the 16 bit ABI so it had to cheat and call out to GCC to get the system to boot says a lot.

Without enough examples to copy from (despite CPU manuals being available in the training set) the approach failed. I wonder how well it'll do when you throw it a new/imaginary instruction set/CPU architecture; I bet it'll fail in similar ways.

jsnell 4 hours ago | parent | next [-]

"Couldn't stick to the ABI ... despite CPU manuals being available" is a bizarre interpretation. What the article describes is the generated code being too large. That's an optimization problem, not a "couldn't follow the documentation" problem.

And it's a bit of a nasty optimization problem, because the result is all or nothing. Implementing enough optimizations to get from 60kB to 33kB is useless, all the rewards come from getting to 32kB.

jcalvinowens 4 hours ago | parent | prev [-]

IMHO a new architecture doesn't really make it any more interesting: there's too many examples of adding new architectures in the existing codebases. Maybe if the new machine had some bizarre novel property, I suppose, but I can't come up with a good example.

If the model were retrained without any of the existing compilers/toolchains in its training set, and it could still do something like this, that would be very compelling to me.

anematode 4 hours ago | parent | prev [-]

Honestly, probably not a lot. Not that many C compilers are compatible with all of GCC's weird features, and the ones that are, I don't think are written in Rust. Hell, even clang couldn't compile the Linux kernel until ~10 years ago. This is a very impressive project.