Remix.run Logo
jhavera 5 hours ago

Interesting timing. We have been working on something that takes the opposite design philosophy. JSIR is designed for high-fidelity round-trips back to source, preserving all information a human author put in. That makes sense when the consumer is a human-facing tool like a deobfuscator or transpiler.

We have been exploring what an IR looks like when the author is an AI and the consumer is a compiler, and no human needs to read the output at all. ARIA (aria-ir.org) goes the other direction from JSIR. No source round-trip, no ergonomic abstractions, but first-class intent annotations, declared effects verified at compile time, and compile-time memory safety.

The use cases are orthogonal. JSIR is the right tool when you need to understand and transform code humans wrote. ARIA is the right tool when you want the AI to skip the human-readable layer entirely.

The JSIR paper on combining Gemini and JSIR for deobfuscation is a good example of where the two worlds might intersect. Curious whether you have thought about what properties an IR should have to make that LLM reasoning more reliable.

oldmanhorton 4 hours ago | parent [-]

> when the author is an AI and the consumer is a compiler, and no human needs to read the output at all.

This seems like a big bet on the assumption that fully autonomous codegen without humans in the loop is imminent if not already present - frankly, I hope you are wrong.

Even if that comes to pass in some cases, I also find it hard to believe that an LLM will ever be able to generate code in any new language at the same level with which it can generate stack overflow-shaped JavaScript and python, because it’ll never have as robust of a training set for new languages.

measurablefunc 2 hours ago | parent [-]

I recently wrote a simple interpreter for a stack based virtual machine for a Firefox extension to do some basic runtime programming b/c extensions can't generate & evaluate JavaScript at runtime. None of the consumer AIs could generate any code for the stack VM of any moderate complexity even though the language specification could fit on a single page.

We don't have real AI & no one is anywhere near anything that can consistently generate code of moderate complexity w/o bugs or accidental issues like deleting files during basic data processing (something I ran into recently while writing a local semantic search engine for some of my PDFs using open source neural networks).