| ▲ | adrian_b 5 hours ago | |
The same is true for a "switch" statement. If the "case" values used in a "switch" are sparse, the "switch" will not be compiled into an indexed jump instruction, but into a sequence of conditional jump instructions, which test all the possible cases. In this situation, the alternative to "switch" for implementing dispatching is no longer a computed "goto", but a multiple "if"/"else" sequence. A smarter compiler could detect when a "switch" forms the body of a loop and it would replicate the indexed jump instruction at the end of each case, instead of jumping to the beginning of the loop to execute there an indexed jump, or even worse, first jumping to the end of the loop to terminate the "switch", then jumping to the beginning of the loop to repeat the loop body. With such a compiler, computed "goto" would not be necessary as an alternative to "switch". The range check inside the dispatch loop would not be necessary if the opcodes had an enumeration type (in a programming language where enumeration types are clearly distinct from integers) and the "switch" handled all the possible cases. In that situation, the range checks would be moved elsewhere in the program, wherever opcodes are generated. | ||
| ▲ | wahern 2 hours ago | parent [-] | |
The GCC documentation example for computed goto uses an indexing table, but you can of course use computed goto addresses directly. The problem is smuggling the label addresses outside the function so you don't need the enum/index indirection during dispatch. I once experimented with some techniques and found you can use constructors, either on function-scoped static variables in C++ or using the constructor function attribute on a nested function in GCC C. The addresses are visible to the constructor function and you can smuggle them by copying through a file- or global- scoped array, then at runtime initialize your data structures and bytecode arrays with the computed goto addresses. That preserves the CPUs branch prediction and prefetching resources for other work. You can also lazily initialize things, which works well if you're not precompiling bytecode but implementing something like a coroutine or state machine where you just need to initialize the first dispatch address. Even on modern chips avoiding computed goto indexing can be meaningful (5+%) depending on use case, but the marginal gains from the smuggling shenanigans specifically probably aren't worth the code complexity and portability headaches (clang doesn't support nested functions in C), though I never tried that specific trick with a full-blown bytecode VM. These days with guaranteed tailcall extensions you can just dispatch on function addresses (knowable before dispatch), though you can't directly control inlining optimizations so if you're mostly doing simple work per operation computed gotos might still have better performance. And for small or straight-forward state machines code can be easier to grok when not split across many functions, especially if they're tailcalling each other through pointers. | ||