Remix.run Logo
Hendrikto 20 hours ago

TLDR: The tail-calling interpreter is slightly faster than computed goto.

> I used to believe the the tailcalling interpreters get their speedup from better register use. While I still believe that now, I suspect that is not the main reason for speedups in CPython.

> My main guess now is that tail calling resets compiler heuristics to sane levels, so that compilers can do their jobs.

> Let me show an example, at the time of writing, CPython 3.15’s interpreter loop is around 12k lines of C code. That’s 12k lines in a single function for the switch-case and computed goto interpreter.

> […] In short, this overly large function breaks a lot of compiler heuristics.

> One of the most beneficial optimisations is inlining. In the past, we’ve found that compilers sometimes straight up refuse to inline even the simplest of functions in that 12k loc eval loop.

kccqzy 18 hours ago | parent | next [-]

I think in the protobuf example the musttail did in fact benefit from better register use. All the functions are called with the same arguments, so there is no need to shuffle the registers. The same six register-passed arguments are reused from one function to the next.

cma 18 hours ago | parent | prev [-]

Does MSVC support computed goto?