Remix.run Logo
Const-me a day ago

For the sending side, a small buffer (which should implement a flush method called after serializing the complete message) indeed helps amortize costs of system calls. However, a buffer large enough for 1GB messages will waste too much memory. Without such buffer on the sender side, it’s impossible to prepend every message with the length: you don’t know the length of a serialized messages until the entire message is serialized.

With streamed serialization, the receiver doesn’t know when the message will end. This generally means you can’t optimize LEB128 decoding by testing high bits of 10 bytes at once.

For example, let’s say the message is a long sequence of strings. Serialized into a sequence of pairs [ length, payload ] length is var.int, payload is an array of UTF8 bytes of that length, and the message is terminated with a zero-length string.

You can’t implement data parallel LEB128 decoder for the length field in that message by testing multiple bytes at once because it may consume bytes past the end of the message. A decoder for MKV variable integers only needs 1-2 read calls to decode even large numbers, because just the first byte contains the encoded length of the var.int.