Remix.run Logo
tliltocatl 6 days ago

Embedded/constrained UDP is where protobuf wire format (but not google's libraries) rocks: IoT over cellular and such, where you need to fit everything into a single datagram (number of roundtrips is what determines power consumption). As to those who say "UDP is unreliable" - what you do is you implement ARQ on the application level. Just like TCP does it, except you don't have to waste roundtrips on SYN-SYN-ACK handshake nor waste bytes on sending data that are no longer relevant.

Varints for the win. Send time series as columns of varint arrays - delta or RLL compression becomes quite straightforward. And as a bonus I can just implement new fields in the device and deploy right away - the server-side support can wait until we actually need it.

No, flatbuffers/cap'n'proto are unacceptably big because of fixed layout. No, CBOR is an absolute no go - why on earth would you waste precious bytes on schema every time? No, general-purpose compression like gzip wouldn't do much on such a small size, it will probably make things worse. Yes, ASN is supposed to be the right solution - but there is no full-featured implementation that doesn't cost $$$$ and the whole thing is just too damn bloated.

Kinda fun that it sucks for what it is supposed to do, but actually shines elsewhere.

henningpeters 6 days ago | parent | next [-]

> why on earth would you waste precious bytes on schema every time

cbor doesn't prescribe sending schema, in fact there is no schema, like json.

i just switched from protobuf to cbor because i needed better streaming support and find use it quite delightful. losing protobuf schema hurts a bit, but the amount of boilerplate code is actually less than what i had before with nanopb (embedded context). on top of it, i am saving approx. 20% in message size compared to protobuf bc i am using mostly arrays with fixed position parameters.

tliltocatl 6 days ago | parent [-]

> cbor doesn't prescribe sending schema, in fact there is no schema, like json.

You are right, I must have confused CBOR with BSON where you send field names as strings.

>on top of it, i am saving approx. 20% in message size compared to protobuf bc i am using mostly arrays with fixed position parameters

Arrays with fixed position is always going to be the most compact format, but that means that you essentially give up on serialization. Also, when you have a large structure (e. g. full set of device state and settings)where most of the fields only change infrequently, it makes sense to only send what's changed, and then TLV is significantly better.

cryptonector 5 days ago | parent | prev | next [-]

> Yes, ASN is supposed to be the right solution - but there is no full-featured implementation that doesn't cost $$$$ and the whole thing is just too damn bloated.

Oh for crying out loud! PB had ZERO tooling available when it was created! It would have been much easier to create ASN.1 tooling w/ OER/PER and for some suitable subset of ASN.1 in 2001 that it was to a) create an IDL, b) create an encoding, and c) write tooling for N programming languages.

In fact, one thing one could have done is write a transpiler from the IDL to an AST that does all linting, analysis, and linking, and which one can then use to drive codegen for N languages. Or even better: have the transpiler produce a byte-coded representation of the modules and then for each programming language you only need to codegen the types but not the codecs -- instead for each language you need only write the interpreter for the byte-coded modules. I know because I've extended and maintained an [open source] ASN.1 compiler that fucking does [some of] these things.

Stop spreading this idea that ASN.1 is bloated. It's not. You can cut it down for your purposes. There's only 4 specifications for the language itself, of which the base one (x.680) is enough for almost everything (the others, X.681, X.682, and X.683, are mainly for parameterized types and formal typed hole specifications [the ASN.1 "information object system], which are awesome but you can live without). And these are some of the best-written and most-readable specifications ever written by any standards development organization -- they are a great gift from a few to all of mankind.

tliltocatl 5 days ago | parent [-]

> It would have been much easier to create ASN.1 tooling w/ OER/PER and for some suitable subset of ASN.1 in 2001

Just by looking at your past comments - I agree that if google reused ASN.1, we would have lived in a better world. But the sad reality now is that PB gots tons of FOSS tooling and ASN.1 barely any (is there any free embedded-grade implementation other than asn1cc?) and figuring out what features you can use without having to pledge your kidney and soul to Nokalva is a bit hard.

I tried playing with ASN.1 before settling on protobuf. Don't recall which compiler I used, but immediately figured out that apparently datetime datatype is not supported, and the generated C code was bloated mess (so is google's protobuf - but not nanopb). Protobuf, on the other hand, was quite straightforward on what is and is not supported. So us mortals who aren't google and have a hard time justifying writing serdes from scratch gotta use what's available.

> Stop spreading this idea that ASN.1 is bloated

"Bloated" might be the wrong word - but it is large and it's damn hard for someone designing a new application to figure out which part is safe to use, because most sources focus on using it for decoding existing protocols.

cryptonector 5 days ago | parent [-]

For sure PB is a fact of life now. A regrettable fact of life, but perhaps a lesson (that few will heed).

grogers 5 days ago | parent | prev [-]

Other than ASN.1 PER, is there any other widely used encoding format that isn't self-describing? Using TLV certainly adds flexibility around schema evolution, but I feel like collectively we are wasting a fair amount of bytes because of it...

tliltocatl 5 days ago | parent | next [-]

Cap'n'proto doesn't have tags, but it wastes even more bytes in favor of speed. Than again, omitting tags only saves space if you are sending all the fields every time. PER uses a bitmap, which is still a bit wasteful on large sparse structs.

cryptonector 4 days ago | parent [-]

PER sends a bitmap only of OPTIONAL members' (fields') presence/absence. Required members are just where you expect them: right after their preceding members.

cryptonector 4 days ago | parent | prev | next [-]

Also JSOON and XML are not TLV, though of course they're not really good examples of non-TLV encodings -- certainly they can't be what you had in mind.

cryptonector 5 days ago | parent | prev [-]

OER (related to PER)

XDR (ONC RPC, NFS)

MS RPC (DCE RPC w/ tweaks)

Flat Buffers