I'm more than a little curious what event caused such a strong objection to protobuffers. :D

I do tend to agree that they are bad. I also agree that people put a little too much credence in "came from Google." I can't bring myself to have this much anger towards it. Had to have been something that sparked this.

▲

rimunroe 7 days ago | parent | next [-]

I'm just a frontend developer so most of my exposure is just as an API consumer and not someone working on the service side of things. That said:

A few years ago I moved to a large company where protobufs were the standard way APIs were defined. When I first started working with the generated TypeScript code, I was confused as to why almost all fields on generated object types were marked as optional. I assumed it was due to the way people were choosing to define the API at first, but then I learned this was an intentional design choice on the part of protobufs.

We ended up having to write our own code to parse the responses from the "helpfully" generated TypeScript client's responses. This meant we had to also handle rejecting nonsensical responses where an actually required field wasn't present, which is exactly the sort of thing I'd want generated clients to do. I would expect having to do some transformation myself, but not to that degree. The generated client was essentially useless to us, and the protocol's looseness offered no discernible benefit over any other API format I've used.

I imagine some of my other complaints could be solved with better codegen tools, but I think fundamentally the looseness of the type system is a fatal issue for me.

▲

vl 6 days ago | parent | next [-]

It used to be that there was no official TypeScript protobuf generator from Google and third-party generators sucked. Using protobufs from web browser or in nodejs was painful.

Couple years ago Connect released very good generator for TypeScript, we use in in production and it's great:

https://github.com/connectrpc/connect-es

▲

thinkharderdev 7 days ago | parent | prev [-]

Yeah, as soon as you have a moderately complex type the generated code is basically useless. Honestly, ~80% of my gripes about protocol buffers could be alleviated by just allowing me to mark a message field as required.

▲

cherrycherry98 7 days ago | parent | next [-]

Proto2 let you do this and the "required" keyword was removed because of the problems it introduces when evolving the schema in a system with many users that you don't necessarily control. Let's say you want to add a new required field, if your system receives messages from clients some clients may be sending you old data without the field and now the parse step fails because it detects a missing field. If you ever want to remove a required field you have the opposite problem, there will components that have to have those fields present just to satisfy the parser even if they're only interested in some other fields.

Philosophically, checking that a field is required or not is data validation and doesn't have anything to do with serialization. You can't specify that an integer falls into a certain valid range or that a string has a valid number of characters or is the correct format (e.g. if it's supposed to be an email or a phone number). The application code needs to do that kind of validation anyway. If something really is required then that should be the application's responsibility to deal with it appropriately if it's missing.

The Captn Proto docs also describe why being able to declare required fields is a bad idea: https://capnproto.org/faq.html#how-do-i-make-a-field-require...

▲

thinkharderdev 6 days ago | parent | next [-]

> Philosophically, checking that a field is required or not is data validation and doesn't have anything to do with serialization

But protocol buffers is not just a serialization format it is an interface definition language. And not being able to communicate that a field is required or not is very limiting. Sometimes things are required to process a message. If you need to add a new field but be able to process older versions of the message where the field wasn't required (or didn't exist) then you can just add it as optional.

I understand that in some situations you have very hard compatibility requirements and it makes sense to make everything optional and deal with it in application code, but adding a required attribute to fields doesn't stop you from doing that. You can still just make everything optional. You can even add a CI lint that prevents people from merging code with required fields. But making required fields illegal at the interface definition level just strikes me as killing a fly with a bazooka.

▲

rimunroe 7 days ago | parent | prev | next [-]

> Philosophically, checking that a field is required or not is data validation and doesn't have anything to do with serialization.

My issue is that people seem to like to use protobuf to describe the shape of APIs rather than just something to handle serialization. I think it's very bad at the describing API shapes.

	▲	taeric 6 days ago \| parent [-]
		I think it is somewhat of a natural failure of DRY taken to the extreme? People seem to want to get it so that they describe the API in a way that is then generated for clients and implementations. It is amusing, in many ways. This is specifically part of what WSDL aspired to, but people were betrayed by the big companies not having a common ground for what shapes they would support in a description.

▲

instig007 6 days ago | parent | prev [-]

> Let's say you want to add a new required field, if your system receives messages from clients some clients may be sending you old data without the field and now the parse step fails because it detects a missing field.

A parser has to (inherently) neither fail (compatibility mode) nor lose the new field (a passthrough mode), nor allow diverging (strict mode). The fact that capnproto/parser authors don't realize that the same single protocol can operate in three different scenarios (but strictly speaking: at boundaries vs in middleware) at the same time, should not result in your thinking that there are problems with required fields in protocols. This is one of the most bizzare kinds of FUD in the industry.

▲

kentonv 6 days ago | parent [-]

Hi, I'm the apparently-FUD-spreading Cap'n Proto author.

Sure! You could certainly imagine extending Protobuf or Cap'n Proto with a way to specify validation that only happens when you explicitly request it. You'd then have separate functions to parse vs. to validate a message, and then you can perform strict validation at the endpoints but skip it in middleware.

This is a perfectly valid feature idea which many people have entertained an even implemented successfully. But I tend to think it's not worth trying to do have this in the schema language because in order to support every kind of validation you might want, you end up needing a complete programming language. Plus different components might have different requirements and therefore need different validation (e.g. middleware vs. endpoints). In the end I think it is better to write any validation functions in your actual programming language. But I can certainly see where people might disagree.

▲

lostdog 6 days ago | parent [-]

It gets super frustrating to have to empty/null check fields everywhere you use them, especially for fields that are effectively required for the message to make sense.

A very common example I see is Vec3 (just x, y, z). In proto2 you should be checking for the presence of x,y,z every time you use them, and when you do that in math equations, the incessant existence checks completely obscure the math. Really, you want to validate the presence of these fields during the parse. But in practice, what I see is either just assuming the fields exist in code and crashing on null, or admitting that protos are too clunky to use, and immediately converting every proto into a mirror internal type. It really feels like there's a major design gap here.

Don't get me started on the moronic design of proto3, where every time you see Vec3(0,0,0) you get to wonder whether it's the right value or mistakenly unset.

▲

kentonv 6 days ago | parent [-]

> It gets super frustrating to have to empty/null check fields everywhere you use them, especially for fields that are effectively required for the message to make sense.

That's why Protobuf and Cap'n Proto have default values. You should not bother checking for presence of fields that are always supposed to be there. If the sender forgot to set a field, then they get the default value. That's their problem.

> just assuming the fields exist in code and crashing on null

There shouldn't be any nulls you can crash on. If your protobuf implementation is returning null rather than a default value, it's a bad implementation, not just frustrating to use but arguably insecure. No implementation of mine ever worked that way, for sure.

▲

lostdog 5 days ago | parent [-]

Sadly, the default values are an even bigger source of bugs. We just caught another one at $work where a field was never being filled in, but the default values made it look fine. It caused hidden failures later on.

It's an incredibly frustrating "feature" to deal with, and causes lots of problems in proto3.

	▲	kentonv 5 days ago \| parent [-]
		You can still verify presence explicitly if you want, with the `has` methods. But if you don't check, it should return a default value rather than null. You don't want your server to crash on bad input.

▲

iamdelirium 7 days ago | parent | prev | next [-]

You think you do but you really don't.

What happens if you mark a field as required and then you need to delete it in the future? You can't because if someone stored that proto somewhere and is no longer seeing the field, you just broke their code.

▲

thinkharderdev 7 days ago | parent | next [-]

If you need to deserialize an old version then it's not a problem. The unknown field is just ignored during deserialization. The problem is adding a required field since some clients might be sending the old value during the rollout.

But in some situations you can be pretty confident that a field will be required always. And if you turn out to be wrong then it's not a huge deal. You add the new field as optional first (with all upgraded clients setting the value) and then once that is rolled out you make it required.

And if a field is in fact semantically required (like the API cannot process a request without the data in a field) then making it optional at the interface level doesn't really solve anything. The message will get deserialized but if the field is not set it's just an immediate error which doesn't seem much worse to me than a deserialization error.

▲

iamdelirium 7 days ago | parent [-]

1. Then it's not really required if it can be ignored.

2. This is the problem, software (and protos) can live for a long time). They might be used by other clients elsewhere that you don't control. What you thought might not required 10 years down the line is not anymore. What you "think" is not a huge deal then becomes a huge deal and can cause downtime.

3. You're mixing business logic and over the wire field requirement. If a message is required for an interface to function, you should be checking it anyway and returning the correct error. How is that change with proto supporting require?

	▲	thinkharderdev 6 days ago \| parent [-]
		> Then it's not really required if it can be ignored. It can be required in v2 but not in v1 which was my point. If the client is running v2 while the server is still on v1 temporarily, then there is no problem. The server just ignores the new field until it is upgraded. > This is the problem, software (and protos) can live for a long time). They might be used by other clients elsewhere that you don't control. What you thought might not required 10 years down the line is not anymore. What you "think" is not a huge deal then becomes a huge deal and can cause downtime. Part of this is just that trying to create a format that is suitable both as an rpc wire serialization format and ALSO a format suitable for long term storage leads to something that is not great for either use case. But even taking that into account, RDBMS have been dealing with this problem for decades and every RDBMS lets you define fields as non-nullable. > If a message is required for an interface to function, you should be checking it anyway and returning the correct error. How is that change with proto supporting require? That's my point, you have to do that check in code which clutters the implementation with validation noise. That and you often can't use the wire message in your internal domain model since you now have to do that defensive null-check everywhere the object is used. Aside from that, protocol buffers are an interface definition language so should be able to encode some of the validation logic at least (make invalid states unrepresentable and all that). If you are just looking at the proto IDL you have no way of knowing whether a field is really required or not because there is no way to specify that.

▲

ozgrakkurt 7 days ago | parent | prev [-]

Maybe you don’t delete it then?

	▲	taeric 7 days ago \| parent [-]
		I mean, this is essentially the same lesson that database admins learn with nullable fields. Often it isn't the "deleting one is hard" so much as "adding one can be costly." It isn't that you can't do it. But the code side of the equation is the cheap side.

▲

taeric 7 days ago | parent | prev [-]

To add to the sibling, I've seen this with Java enums a lot. People will add it so that the value is consumed using the enum as fast as they can. This works well as long as the value is not retrieved from data. As soon as you do that, you lose the ability to add to the possible values in a rolling release way. It can be very frustrating to know that we can't push a new producer of a value before we first change all consumers. Even if all consumers already use switch statements with default clauses to exhaustively cover behavior.

▲

thinkharderdev 7 days ago | parent [-]

But this is something you should be able to handle on a case-by-case basis. If you have a type which is stored durably as protobuf then adding required fields is much harder. But if you are just dealing with transient rpc messages then it can be done relatively easily in a two step process. First you add the field as optional and then once all producers are upgraded (and setting the new field), make it required. It's annoying for sure but still seems better than having everything optional always and needing to deal with that in application code everywhere.

▲

taeric 7 days ago | parent [-]

Largely true. If you are at Google scale, odds are you have mixed fleets deployed. Such that it is a bit of involved process. But it is well defined and doable. I think a lot of us would rather not do a dance we don't have to do?

	▲	thinkharderdev 6 days ago \| parent [-]
		Sure, you just have to balance that against the cost of a poorly specified API interface. The errors because clients aren't clear on what is really required or not, what they should consider an error if it is not defined, etc. And of course all the boilerplate code that you have to write to convert the interface model to an internal domain model you can actually use inside your code.

▲

mrits 7 days ago | parent | prev | next [-]

I've used them almost daily for 15 years. They are way down the list of things I'd want improved. It has been interesting to see the protobuffers killers die out every few years though

▲

thinkharderdev 7 days ago | parent | prev | next [-]

I feel like I could have written an article like this at various points. Probably while spending two hours trying to figure out a way to represent some protobuf type in a sane way internally.

▲

jandrese 7 days ago | parent | prev | next [-]

As a developer I always see "came from Google" as a yellow flag.

Too often I find something mildly interesting, but then realize that in order for me to try to use it I need to set up a personal mirror of half of Google's tech stack to even get it to start.

▲

mike_hearn 7 days ago | parent | prev [-]

He says that in the article; he had to work on a "compiler" project that was much harder than it should have been because of protobuf's design choices.

	▲	taeric 7 days ago \| parent [-]
		Yeah, I saw that. I took that as something that happened in the past, though. Certainly colored a lot of the thinking, but feels like something more immediate had to have happened. :D