Remix.run Logo
mananaysiempre 2 days ago

> This kind of "expected next field" optimization has a long history in protobuf

You could probably even trace the history of the idea all the way to Van Jacobson’s 30-instruction TCP fastpath[1]. Or to go a bit closer, I’ve found that an interpreter for a stack+accumulator VM (which, compared to the pure stack option, is prone to blowing up the bytecode count and thus dispatch cost with the constant PUSH-accumulator instructions) goes significantly faster if you change the (non-shared) dispatch from

  return impl[*pc](pc, ...);
to

  if (*pc == PUSH) {
      do_push(...); pc++;
  }
  return impl[*pc](pc, ...);
which feels somewhat analogous to the next-field optimization and avoids polluting the indirect branch predictor with the very common PUSH predictions. (It’s still slower than not having those PUSHes in the first place.)

[1] https://www.pdl.cmu.edu/mailinglists/ips/mail/msg00133.html