Remix.run Logo
pizlonator 9 hours ago

Compiler writer here.

This post makes it seem like the pass ordering problem is bigger than it really is and then overestimates the extent to which egraphs solve it.

The pass ordering problem isn’t a big deal except maybe in the interaction of GVN and load elimination, but practically, that ends up being a non issue because its natural to make those be the same pass.

Aside from that, pass ordering isn’t a source of sweat for me or my colleagues. Picking the right pass order is fun and easy compared to the real work (designing the IR and writing the passes).

When pass ordering does come to bite you, it’s in a way that egraphs won’t address:

- You need to run some pass over a higher level IR, some other pass over a lower level IR (ie later), and then you discover that the higher level pass loses information needed by the lower level one. That sucks, but egraphs won’t help.

- You might have some super fancy escape analysis and some super fancy type inference that ought to be able to help each other but can only do so to a limited extent because they’re expensive to run repeatedly to fixpoint and even then they can’t achieve optimality. Both of them are abstract interpreters from hell. Too bad so sad - your only full solution is creating an even more hellish abstract interpreter. Egraphs won’t help you.

- Your tail duplication reveals loops so you want to run it before loop optimization. But your tail duplication also destroys loops so you want to run it after loop optimization. No way around this if you want to do fully aggressive taildup. I end up just running it late. I don’t think egraphs will help you here either.

Worth noting that the first problem is sort of theoretical to me, in the sense that I’ve always found some ordering that just works. The second problem happens, but when it does happen, it’s reasonable to have high level and low level versions of the optimizations and just do both (like how the FTL does CSE in three IRs). The last problem is something I still think about.

cfallin 8 hours ago | parent | next [-]

Hi Fil -- thanks for the comment!

I think we may be playing in slightly different spaces: unlike a JS JIT, Cranelift doesn't have "super fancy escape/type analysis". We're really targeting the core optimizations (GVN, LICM, cprop, RLE, STLF, various algebraic rules) in a fast compile pass. The RLE+GVN interaction was pretty real in our case, as are interactions between the algebraic rewrites and GVN.

You'll note that my main point is that the single fixpoint loop for all of the core rewrites is what we wanted, and what the sea-of-nodes-with-CFG gets us; the egraph (multiple versions of one value) is kind of an aside. One could say: well sure but I could just do a single pass with that fixpoint loop without all the egraph stuff; and, yes, that's what our single rewrite pass is.

titzer 8 hours ago | parent | next [-]

Thanks for writing the article, btw. I didn't have a chance to go through the whole thing yet.

Did you have a chance to study Graal's IR? It is a hybrid between sea of nodes and CFG; it can contain some "fixed" nodes that can be wired into basic blocks. It can also be relaxed and have nearly everything floating.

TurboFan's IR was very close to C2, but it had even more things that could float. E.g. a pure operation could be lowered to a subgraph with internal control. TurboFan's schedule could move entire floating control islands and attach them to the main CFG skeleton, or remove them entirely.

I'm working on a new IR and I'll be able to share more soon.

cfallin 8 hours ago | parent [-]

Thanks! I haven't studied Graal's IR in detail, no. I'll add it to my reading list...

pizlonator 8 hours ago | parent | prev [-]

> I think we may be playing in slightly different spaces: unlike a JS JIT

My B3 compiler is a direct equivalent of Cranelift.

Sure, I've also written JS JITs. B3 was initially the backend of a JS JIT, but now does other things too. And I've done a lot of LLVM work.

Generally, it's unwise to assume you know what another person has or hasn't done. I'm not making that assumption about you. (I don't even know you, and that doesn't matter for the purposes of this discussion.)

> single fixpoint loop for all of the core rewrites is what we wanted, and what the sea-of-nodes-with-CFG gets us

It just feels so constraining. Eventually you'll want optimizations that can't be expressed with egraphs, and I'm not convinced that your approach conserves code or complexity even if you are sticking to egraphable optimizations.

cfallin 8 hours ago | parent [-]

OK, cool. I was assuming "escape analysis and type inference" implied a JS JIT -- straight from your comment, no other assumptions intended. But you've got a lot of interesting experience here and thanks for your thoughts.

All the best!

diamondlovesyou 3 hours ago | parent | prev | next [-]

> This post makes it seem like the pass ordering problem is bigger than it really is and then overestimates the extent to which egraphs solve it.

It isn't so much for SoTA implementations like LLVM, but it is for HL IRs like those present in MLIR. For LLVM, you're basically always in the same representation and every pass operates in that shared representation. But even then, this is not quite true. For example, SLP in LLVM is one of the last passes because running SLP before most "latency sensitive cleanups" would break most of them.

In particular, HL to LL lowering pipelines suffer very heavily from the ordering concerns.

6 hours ago | parent | prev [-]
[deleted]