Remix.run Logo
instig007 6 days ago

> Real-world practice has also shown that quite often, fields that originally seemed to be "required" turn out to be optional over time

how often? as practiced by who, and where?

> 2. You actually do not want a oneof field to be repeated!

> How do you make this change without breaking compatibility? Now you wish that you had defined your array as an array of messages, each containing a oneof, so that you could add a new field to that message. But because you didn't, you're probably stuck creating a parallel array to store your new field. That sucks.

Nice, "explain to me how you're going to implement a backward-compatible SUM in the spec-parser that doesn't have the notions needed. Ha! You can't! Told you so!"

> But because you didn't, you're probably stuck creating a parallel array to store your new field. That sucks.

Not really, `oneoff token` is isomorphic to `oneoff (token unit)` and going from the former to the latter doesn't require binary encoding change at all, if the encoding is optimal. Getting from `oneoff (token unit)` to `oneoff (token { linepos })`, depending on the binary encoding format you design, doesn't require you making changes to the parser's runtime, as long as the parser takes into account that `unit` is isomorphic to the zero-arity-product `{}`, and since both `{}` and `{ linepos }` can be presented with a fixed positional addressing, you get your values in a backward-compatible way, but under a specific condition: the parser library API provides `repeated (oneoff <A>)` as a non-materialised stream of values <A>, so that the exact interpretation of <A> happens at a user's calling site, according to the existing stated protocol spec: if it says `<A> = token`, then `list (repeated (oneoff (token { linepos })))` is isomorphic to `list (repeated (oneoff token))` in the deployed version of the protocol that knows nothing about the line positions, so my endpoints can send you either of:

    * Version 0: [len][oneoff_bincode][token_arr]

    * Version 1: [len][oneoff_bincode_sum][token_arr][unit]

    * Version 2: [len][oneoff_bincode_sumprod][token_arr][prod_arr]

    * Version 3: [len][oneoff_bincode_sumprod_sparse][token_arr][presence_arr][prod_arr]
kentonv 6 days ago | parent [-]

> how often? as practiced by who, and where?

This was my experience in Google Search infrastructure circa 2005-2010. This was a system with dozens of teams and hundreds of developers all pushing their data through a common message bus.

It happened all the damned time and caused multiple real outages (from overzealous validation), along with a lot of tech debt involving having to initialize fields with dummy data because they weren't used anymore but still required.

Reports from other large teams at google, e.g. gmail, indicated they had the same problems.

> Nice, "explain to me how you're going to implement a backward-compatible SUM in the spec-parser that doesn't have the notions needed. Ha! You can't! Told you so!"

Sure sure, we could expand the type system to support some way of adding a new tag to every element of the repeated oneof, implement it, teach everyone how that works, etc.

Or we could just tell people to wrap the thing in a `message`. It works fine already, and everyone already understands how to do it. No new cognitive load is created, and no time is wasted chasing theoretical purity that provides no actual real-world benefit.

instig007 4 days ago | parent [-]

> This was my experience in Google Search infrastructure circa 2005-2010 [...]

> Reports from other large teams at google

> teach everyone how that works, etc.

> Or we could just tell people to wrap the thing in a `message`

It really sounds like a self-inflicted internal google issue. Can you address the part where I mention isomorphism of (oneof token) and (oneof (token {})), and clarify what exactly do you think you'd have to teach other engineers to do, if your protocol's encoders and decoders took this property into account?

kentonv 4 days ago | parent [-]

You seem to have merged the required fields issue and the oneof issue, but these are unrelated threads.

> Can you address the part where I mention isomorphism of (oneof token) and (oneof (token {})), and clarify what exactly do you think you'd have to teach other engineers to do, if your protocol's encoders and decoders took this property into account?

What you have written is not a serious proposal in terms of a working way to extend Protocol Buffers to allow repeated oneofs.

What you have written is a very complicated way of saying: "You theoretically could support extensible repeated oneofs, with the right type system and protocol design."

Yes, I know that. With a clean slate, we can do anything. But in the real world (yes I'm going to keep saying that, since you don't seem very familiar with it), you don't get to start from a clean slate every time you don't like how things have turned out.

As it stands, the product type in protobufs is `message`, the sum type is `oneof`, and the vector type is `repeated`. The way `oneof` is encoded on the wire is exactly one of the tags appear. The way `repeated` is encoded on the wire is that the same tag appears many times. The way `message` is encoded is that it's a length-delimited byte blob that contains a series of tag-values inside. Unfortunately, this encoding means if we supported `repeated oneof`, it would not be extensible.

So we ban `repeated oneof`, and say "you need to write a repeated message, where the message contains a `oneof`". This isn't as pretty as people might like but it works just fine in practice and we move on to more important things.

instig007 4 days ago | parent [-]

> What you have written is not a serious proposal in terms of a working way to extend Protocol Buffers to allow repeated oneofs.

I didn't intend to propose a solution for protobuf specifically, I explained why the author of the subject article had a point in calling the authors of protobuf amateurs, given the existing spec, that led to specific implementations of the parsers, and the respective downsides.

> But in the real world (yes I'm going to keep saying that, since you don't seem very familiar with it)

I'll have to repeat that "real-world vs the rest of you" talking point is the specific attitude of (ex-)google folks that make them look amateur or, at least, ignorant.

> This isn't as pretty as people might like but it works just fine in practice and we move on to more important things.

That doesn't explain why you didn't implement it differtently, you just stated that you did. So, why didn't you implement it differently, if you admit that a few lines above with: "Yes, I know that. With a clean slate, we can do anything."