Remix.run Logo
crabbone 2 days ago

Not enough programmers talk to QA, and so are oblivious to testing.

The benchmarking here is... pointless. Here's my story (of writing a Protobuf parser) to illustrate why. I already wrote about it before, but it keeps happening, so, maybe repeating it is not such a bad thing.

So... the official (coming from Google) Protobuf parser for Python is ridiculously bad. It generates Python code for the parser. Not only is this unnecessary, it's also executed poorly. So, I decided to roll my own. Also, I decided to make it multi-threaded (I wrote it in C, with Python bindings).

Then I tried to benchmark it against the official implementation. I don't trust the quality of Python language implementation, so, that came under the spotlight first. For example, I chose Python's enumerators to represent Protobuf enumerators. Turned out that was a horrible decision because instantiation of Python's enumerators is extremely resource-intensive. I had to make a few similar adjustments, searching for the less resource-hungry Python built-in data-structures.

And yet I was still far behind on my benchmarks. So, I tried to understand what the Google's C++ code was doing. And then I realized that Google's parser in the "first pass" only extracted the top-level messages, not even trying to parse the hierarchical structure. This structure was only extracted on-demand. So, essentially, I was competing against memcpy()...

But, here's another catch: most real-world applications will not want the "intermediate" Protobuf structure made up of general-purpose data-structures such as dictionaries or enumerators. They will want them immediately parsed into domain-specific data-structures. So, a parser that offers this kind of functionality might be slower to parse into generic data-structures, but will win in real-world benchmarks.

Of course, the percentage of useful payload is also important. Typically, applications underutilize the messages they exchange. But, to what degree is different and completely dependent on application design. Similarly, mapping into domain-specific objects will depend on the sort of information the application wants to exchange. Protobuf is bad for column / tabular data, but somewhat better for object graph kind of data. Protobuf is bad for large messages with duplicated content, but is somewhat better for short messages with little duplication.

All of this, and much more, allows for a lot of variability in benchmark design, and would allow me, or anyone writing a parser to skew the numbers in whatever direction I want... to amuse the general public.