▲ | mananaysiempre 2 days ago | |
> This kind of "expected next field" optimization has a long history in protobuf You could probably even trace the history of the idea all the way to Van Jacobson’s 30-instruction TCP fastpath[1]. Or to go a bit closer, I’ve found that an interpreter for a stack+accumulator VM (which, compared to the pure stack option, is prone to blowing up the bytecode count and thus dispatch cost with the constant PUSH-accumulator instructions) goes significantly faster if you change the (non-shared) dispatch from
to
which feels somewhat analogous to the next-field optimization and avoids polluting the indirect branch predictor with the very common PUSH predictions. (It’s still slower than not having those PUSHes in the first place.)[1] https://www.pdl.cmu.edu/mailinglists/ips/mail/msg00133.html |