▲ | haberman 3 days ago | ||||||||||||||||
I think I can shed some light on this, as the creator and lead of upb. Calling a Protobuf Parser an "interpreter VM" is a little bit of rhetorical flourish. It comes from the observation that there are some deep structural similarities between the two, which I first observed in an article a few years back: https://blog.reverberate.org/2021/04/21/musttail-efficient-i... > It may seem odd to compare interpreter loops to protobuf parsers, but the nature of the protobuf wire format makes them more similar than you might expect. The protobuf wire format is a series of tag/value pairs, where the tag contains a field number and wire type. This tag acts similarly to an interpreter opcode: it tells us what operation we need to perform to parse this field’s data. Like interpreter opcodes, protobuf field numbers can come in any order, so we have to be prepared to dispatch to any part of the code at any time. This means that the overall structure of a protobuf parser is conceptually a while() loop surrounding a switch() statement, just like a VM interpreter. The tricky part is that the set of "case" labels for a Protobuf parser is message-specific and defined by the fields in the schema. How do we accommodate that? The traditional answer was to generate a function per message and use the schema's field numbers as the case labels. You can see an example of that here (in C++): https://github.com/protocolbuffers/protobuf/blob/f763a2a8608... More recently, we've moved towards making Protobuf parsing more data-driven, where each field's schema is compiled into data that is passed as an argument to a generic Protobuf parser function. We call this "table-driven parsing", and from my read of the blog article, I believe this is what Miguel is doing with hyperpb. The trick then becomes how to make this table-driven dispatch as fast as possible, to simulate what the switch() statement would have done. That question is what I cover at length in the article mentioned above. | |||||||||||||||||
▲ | anonymoushn 3 days ago | parent | next [-] | ||||||||||||||||
Really great. I wonder, for the "types encoded as code" approach, is there any benefit to fast paths for data with fields in ascending order? For some json parsers with types encoded as code I have observed some speedup from either hard-coding a known key order or assuming keys in some order and providing a fallback in case an unexpected key is encountered. For users who are stuck with protobuf forever because of various services using it and various data being encoded this way, the historical data could plausibly be canonicalized and written back in large chunks when it is accessed, so that one need not pay the entire cost of canonicalizing it all at once. But of course the icache concerns are still just as bad. | |||||||||||||||||
| |||||||||||||||||
▲ | mananaysiempre 3 days ago | parent | prev | next [-] | ||||||||||||||||
> More recently, we've moved towards making Protobuf parsing more data-driven, where each field's schema is compiled into data that is passed as an argument to a generic Protobuf parser function. We call this "table-driven parsing", and from my read of the blog article, I believe this is what Miguel is doing with hyperpb. Everything old is new again, I guess—one of the more advertised changes in Microsoft COM as it was maturing (circa 1995) was that you could use data-driven marshalling with “NDR format strings” (bytecode, essentially[1,2]) instead of generating C code. Shortly after there was typelib marshalling (format-compatible but limited), and much later also WinRT’s metadata-driven marshalling (widely advertised but basically completely undocumented). Fabrice Bellard’s nonfree ASN.1 compiler[3] is also notable for converting schemas into data rather than code, unlike most of its open-source alternatives. I still can’t help wondering what it is, really, that makes the bytecode-VM approach advantageous. In 1995, the answer seems simple: the inevitable binary bloat was unacceptable for a system that needs to fit into single-digit megabytes of RAM; and of course bytecode as a way to reduce code footprint has plenty of prior art (microcomputer BASICs, SWEET16, Forth, P-code, even as far back as the AGC). Nowadays, the answer doesn’t seem as straightforward. Sure, the footprint is enormous if you’re Google, but you’re not Google (you’re probably not even Cloudflare), and besides, I hope you’re not Google and can design stuff that’s actually properly adapted to static linking. Sure, the I$ pressure is significant (thinking back to Mike Pall’s explanation[4] why he didn’t find a baseline JIT to be useful), but the bytecode interpreter isn’t going to be a speed demon either. I don’t get it. I can believe it’s true, but I don’t really feel that I get it. [1] https://learn.microsoft.com/en-us/windows/win32/rpc/rpc-ndr-... [2] https://gary-nebbett.blogspot.com/2020/04/rpc-ndr-engine-dce... | |||||||||||||||||
| |||||||||||||||||
▲ | benreesman 2 days ago | parent | prev | next [-] | ||||||||||||||||
Regular expression engines on this model are often called VMs, certainly thats the terminology and layout I used. A big switch/computed-goto loop thing with state codes? Yeah, thats a VM. I think its a very illuminating way to describe it. Nicely done with the implementation as well. | |||||||||||||||||
▲ | pjc50 2 days ago | parent | prev [-] | ||||||||||||||||
> This means that the overall structure of a protobuf parser is conceptually a while() loop surrounding a switch() statement, just like a VM interpreter. This is a very insightful design. Encoding the parser in a table is an old technique, it's what YACC uses. There's a tradeoff between using the CPU's stack+registers versus having your own stack+state in this kind of work, and people have found situations where a small VM is faster because it gets to stay in the instruction cache and benefit from branch prediction, while the less-predictable data stays in the d-cache. |