Remix.run Logo
ivankra 2 days ago

> The initial naive application didn’t even yield much gains. Only after a bunch of optimizations that it really shines: a 30-46% speedup compared to the computed goto interpreter.

Looks like quite a lot of complexity for such gain. 30-40% is roughly what context-threading would buy you [1]. It takes relatively little code to implement - only do honest assembly for jumps and conditional branches, for other opcodes just emit a call to interpreter's handler. Reportedly, it took Apple just 4k LOC to ship first JIT like that in JavaScriptCore [2].

Also, if you haven't seen it, musttail + preserve_none is a cool new dispatch technique to get more mileage out of plain C/C++ before turning to hand-coded assembly/JIT [3]. A step up from computed goto.

[1] https://webdocs.cs.ualberta.ca/~amaral/cascon/CDP05/slides/C...

[2] https://webkit.org/blog/214/introducing-squirrelfish-extreme...

[3] https://godbolt.org/z/TPozdErM5

blakepelton 2 days ago | parent [-]

I wonder how tricks that rely on compiler extensions (e.g., computed goto, musttail, and preserve_none) compare against the weval transform? The weval transform involves a small language extension backed by a larger change to the compiler implementation.

I suppose the downside of the weval transform is that it is only helpful for interpreters, whereas the other extensions could have other use cases.

Academic paper about weval: https://dl.acm.org/doi/pdf/10.1145/3729259

My summary of that paper: https://danglingpointers.substack.com/p/partial-evaluation-w...

ivankra 2 days ago | parent | next [-]

Well, runtime/warmup costs seems like one obvious downside to me - weval would add some non-trivial compilation overhead to your interpreter (unrolling of interpreter loop, dead code elimination, optimizing across opcodes boundaries - probably a major source of speedup). Great if you have the time to precompile your script - only have to pay those costs once. It also helps if your host language's runtime ships with an optimizing compiler/JIT you can piggyback on (WASM runtime in weval's paper, JVM in Graal's case) - these things take space. But sometimes you might just have a huge pile of code that's not hot enough to be worth optimizing and you would be better off with a basic interpreter (that can benefit from computed gotos or tail-call dispatch with zero runtime overhead). Octane's CodeLoad or TypeScript benchmarks are such examples - GraalJS does pretty poorly there.

naasking 2 days ago | parent | prev [-]

Partial evaluation subsumes a lot of other compiler optimizations, like constant folding, inlining and dead code elimination, so it wouldn't just find application with interpreters.