Remix.run Logo
Const-me a day ago

> Actually there is - you can exploit the data parallelism

That doesn’t help much when you’re streaming the bytes, like many parsers or de-serializers do. You have to read bytes from the source stream one by one because each of the next byte might be the last one of the integer being decoded. You could workaround with a custom buffering adapter. Hard to do correctly, costs performance, and in some cases even impossible: when the encoded integers are coming in realtime from network, trying to read next few bytes into a RAM buffer may block.

With MKV variable integers, you typically know the encoded length from just the first byte. Or in very rare cases of huge uint64 numbers, you need 2 bytes for that. Once you know the encoded length, you can read the rest of the encoded bytes with a single read function.

Another thing, unless you are running on a modern AMD64 CPU which supports PDEP/PEXT instructions (parts of BMI2 ISA extension), it’s expensive to split/merge the input bits to/from these groups of 7 bits in every encoded byte. MKV variable integers don’t do that, they only need bit scan and byte swap, both instructions are available on all modern mainstream processors including ARM and are very cheap.

menaerus a day ago | parent [-]

But you never serialize a byte by byte over the network. You encode, let's say, 1000 varints and then send them out on the wire. On the other end, you may or may not know how many varints there are to unpack but since you certainly have to know the length of the packet you can start decoding on a stream of bytes. Largest 64-bit leb128 number occupies 10 bytes whereas the largest 32-bit one occupies 5 bytes so we know the upper bound.

Const-me a day ago | parent [-]

> you never serialize a byte by byte over the network

I sometimes do for the following two use cases.

1. When the protocol delivers real-time data, I serialize byte by byte over the network to minimize latency.

2. When the messages in question are large, like 1GB or more, I don’t want to waste that much memory on both ends of the socket to buffer complete messages like ProtoBuff does. On the sending side, I want to produce the message, encode at the same time with a streaming serializer, and stream bytes directly into the socket. Similarly, on the receiving side I want to de-serialize bytes as they come with a streaming de-serializer. Without buffering the complete message on sending side, receiving side can’t know length of the message in advance before receiving the complete message, because the sender only knows the length after the complete message is serialized.

menaerus a day ago | parent [-]

I still don't understand if you're really sending byte by byte over TCP/UDP or? Because that would be detrimental for performance.

Const-me a day ago | parent [-]

For the sending side, a small buffer (which should implement a flush method called after serializing the complete message) indeed helps amortize costs of system calls. However, a buffer large enough for 1GB messages will waste too much memory. Without such buffer on the sender side, it’s impossible to prepend every message with the length: you don’t know the length of a serialized messages until the entire message is serialized.

With streamed serialization, the receiver doesn’t know when the message will end. This generally means you can’t optimize LEB128 decoding by testing high bits of 10 bytes at once.

For example, let’s say the message is a long sequence of strings. Serialized into a sequence of pairs [ length, payload ] length is var.int, payload is an array of UTF8 bytes of that length, and the message is terminated with a zero-length string.

You can’t implement data parallel LEB128 decoder for the length field in that message by testing multiple bytes at once because it may consume bytes past the end of the message. A decoder for MKV variable integers only needs 1-2 read calls to decode even large numbers, because just the first byte contains the encoded length of the var.int.