Remix.run Logo
mananaysiempre 3 days ago

> More recently, we've moved towards making Protobuf parsing more data-driven, where each field's schema is compiled into data that is passed as an argument to a generic Protobuf parser function. We call this "table-driven parsing", and from my read of the blog article, I believe this is what Miguel is doing with hyperpb.

Everything old is new again, I guess—one of the more advertised changes in Microsoft COM as it was maturing (circa 1995) was that you could use data-driven marshalling with “NDR format strings” (bytecode, essentially[1,2]) instead of generating C code. Shortly after there was typelib marshalling (format-compatible but limited), and much later also WinRT’s metadata-driven marshalling (widely advertised but basically completely undocumented).

Fabrice Bellard’s nonfree ASN.1 compiler[3] is also notable for converting schemas into data rather than code, unlike most of its open-source alternatives.

I still can’t help wondering what it is, really, that makes the bytecode-VM approach advantageous. In 1995, the answer seems simple: the inevitable binary bloat was unacceptable for a system that needs to fit into single-digit megabytes of RAM; and of course bytecode as a way to reduce code footprint has plenty of prior art (microcomputer BASICs, SWEET16, Forth, P-code, even as far back as the AGC).

Nowadays, the answer doesn’t seem as straightforward. Sure, the footprint is enormous if you’re Google, but you’re not Google (you’re probably not even Cloudflare), and besides, I hope you’re not Google and can design stuff that’s actually properly adapted to static linking. Sure, the I$ pressure is significant (thinking back to Mike Pall’s explanation[4] why he didn’t find a baseline JIT to be useful), but the bytecode interpreter isn’t going to be a speed demon either.

I don’t get it. I can believe it’s true, but I don’t really feel that I get it.

[1] https://learn.microsoft.com/en-us/windows/win32/rpc/rpc-ndr-...

[2] https://gary-nebbett.blogspot.com/2020/04/rpc-ndr-engine-dce...

[3] https://bellard.org/ffasn1/

[4] https://news.ycombinator.com/item?id=14460027

anonymoushn 2 days ago | parent [-]

In some workloads, you can benefit from paying the latency cost of data dependencies instead of the latency cost of conditional control flow, but there's no general rule here. It's best to try out several options on the actual task and the actual production data distribution.