Remix.run Logo
mrkeen a day ago

  Why compilers are hard – the IR data structure
If you claim an IR makes things harder, just skip it.

  Compilers do have an essential complexity that makes them "hard" [...waffle waffle waffle...]

  The primary data [...waffle...] represents the computation that the compiler needs to preserve all the way to the output program. This data structure is usually called an IR (intermediate representation). The primary way that compilers work is by taking an IR that represents the input program, and applying a series of small transformations all of which have been individually verified to not change the meaning of the program (i.e. not miscompile). In doing so, we decompose one large translation problem into many smaller ones, making it manageable.
There we go. The section header should be updated to:

  Why compilers are manageable – the IR data structure
WalterBright a day ago | parent | next [-]

In the D compiler, I realized that while loops could be rewritten as for loops, and so implemented that. The for loops are then rewritten using goto's. This makes the IR a list of expression trees connected by goto's. This data structure makes Data Flow Analysis fairly simple.

An early function inliner I implemented by inlining the IR. When I wrote the D front end, I attempted to do this in the front end. This turned out to be a significantly more complicated problem, and in the end not worth it.

The difficulty with the IR versions is, for error messages, it is impractical to try and issue error messages in the context of the original parse trees. I.e. it's the ancient "turn the hamburger into a cow" problem.

UncleEntity a day ago | parent | prev [-]

> Why compilers are manageable – the IR data structure

Yeah, I've been working on an APL interpreter, just for the hell of it as it's a very interesting problem, and without the IR (or continuation graph in this case) I can't even imagine how much harder it would be as you can't really tell what most things are until you evaluate the things around it at runtime -- is it a monadic or dyadic function/operator application, variable lookup, function as an argument, whatever?

The compiler (parser, really) builds this intricate graph which curries computations, dispatches functions and, eventually, evaluates expressions to get to the result. Aside from defined operators, which get looked up in the environment at parse time, the parser just knows that it has the name of something with no real clue what it represents as that changes based on the context.

After the continuation graph is built then it's just graph transformations all the way down, at least in theory, which is what I think TFA was trying to get at because this is where the dragons be. Allegedly, haven't gotten to the optimizer yet but I have some ideas so, we'll see...

For some strange reason I have this fascination with CEK machines so that's what I built and reading the article I was thinking how much easier all this would be (for the robots) if we were dealing with a regular old IR (or a more sane language).