Protobufs definitely doesn’t solve the problems described. Capnproto may solve it but I’m not 100% sure. JSON/XML/ASN.1 definitely don’t.

It’s like you listed a bunch of serialization technologies without grokking the problem outlined in the post doesn’t have much to do with rkyv itself.

▲

8 hours ago | parent | next [-]

[deleted]

▲

zadikian 6 hours ago | parent | prev | next [-]

Pretty sure protobuf used a header to track field presence within a message, similarly to what this article does. That does have its own overhead you could avoid if you knew all fields were present, but that's not the assumption it makes.

▲

UqWBcuFx6NV4r 9 hours ago | parent | prev | next [-]

I have zero doubt that you’re on some ‘no true Scotsman’-style “you’re not doing Real Development if you are using these technologies to solve these problems” thing. Let’s just drop that. There are myriad ‘real man webscale development’ scenarios where these are more than acceptable.

▲

imtringued 13 hours ago | parent | prev | next [-]

Actually, it's you who is giving that impression with an ultra vague "doesn't solve the problems described".

The only problem in the blog post is efficient coding of optional fields and all they was introduce a bitmap. From that perspective, JSON and XML solve the optional fields problem to perfection, since an absent field costs exactly nothing.

▲

vlovich123 8 hours ago | parent [-]

I guess you missed the part where the size of the data stored on disk and efficient deserialization are also critically important performance characteristics that neither JSON nor XML have?

Capnproto doesn’t support transform on serialize - the optional fields still take up disk space unless you use the packed representation which has some performance drawbacks. Also the generated capnproto rust code is quite heavy on compile times which is probably some consideration that’s important for compiling queries.

	▲	zadikian 4 hours ago \| parent [-]
		Indeed Capnproto is more optimized for serdes time than space usage.

▲

locknitpicker 16 hours ago | parent | prev [-]

> Protobufs definitely doesn’t solve the problems described. Capnproto may solve it but I’m not 100% sure. JSON/XML/ASN.1 definitely don’t.

I'm not sure you are serious. What open problem do you have in mind? Support for persisting and deserializing optional fields? Mapping across data types? I mean, some JSON deserializers support deserializing sparse objects even to dictionaries. In .NET you can even deserialize random JSON objects to a dynamic type.

Can you be a little more specific about your assertion?

▲

vlovich123 8 hours ago | parent [-]

The space overhead and the overhead of serialization/deserialization. Rkyv is zero overhead - it’s random access without needing to deserialize and can even be memory mapped.

▲

cozzyd 2 hours ago | parent | next [-]

If you care about space, you're almost certainly going to compress your output (unless, like, you're literally storing random noise) and so you'll necessarily have overhead from that.

Unless the reason you care about space is because it's some sort of wire protocol for a slow network (like LoRaWAN or Iridium packets or a binary UART protocol), where compression probably doesn't make sense because the compression overhead is too large. But even here, just defining the data layout makes sense, I think.

Tihs could take the form of a C struct with __attribute__((packed)) but that is fragile if you care about more platforms than one. (I generally don't, so that works for me!).

▲

amluto 7 hours ago | parent | prev [-]

The whole “zero overhead” thing is IMO a red herring. I care about a few things: stability across versions and languages, space efficiency (sometimes) and performance. I do not care about “overhead” — performance trumps overhead every time.

Your deserializer is probably running on a CPU, and that CPU probably has a very fast L1 cache and might be targeted by a compiler that can do scalar replacement of aggregates and such. A non-zero-overhead deserializer can run very quickly and result in the output being streamed efficiently from its source and ending up hot in L1 in a useful format. A zero-overhead deserializer might do messy reads in a bad order without streaming hints and run much slower.

And then to get very very large records, as in the OP, where getting a good on-disk layout may require thought. And, frequently, the right layout isn’t even array-of-structs, which is why there are so many tools designed to query column stores like Parquet efficiently.

	▲	zadikian 4 hours ago \| parent [-]
		Serdes time can be significant. There are use cases for the zero copy formats even though they use more space. Likewise bit-packed asn1 is often slower than byte-aligned.