Remix.run Logo
aseipp 5 days ago

> JSON contains a description of the structure of the data that is readable by both machines and humans.

No, it by definition does not, because JSON has no schema. Only your application contains and knows the (expected) structure of the data, but you literally cannot know what structure any random blob of JSON objects will have without a separate schema. When you read a random /docs page telling you "the structure of the resulting JSON object from this request is ...", that's just a schema but written in English instead of code. This has big downstream ramifications.

For example, many APIs make the mistake of parsing JSON and only returning some opaque "Object" type, which you then have to map onto your own domain objects, meaning you actually parse every JSON object twice: once into the opaque structure, and once into your actual application type. This has major efficiency ramifications when you are actually dealing with a lot of JSON. The only way to do better than this is to have a schema in some form -- any form at all, even English prose -- so you can go from the JSON text representation directly into your domain type at parse-time. This is part of the reason why so many JSON libraries in every language tend to have some high level way of declaring a JSON object in the host language, typically as some kind of 'struct' or enum, so that they can automatically derive an actually efficient parsing step and skip intermediate objects. There's just no way around it. JSON doesn't have any schema, and that's part of its appeal, but in practice one always exists somewhere.

You can use protobuf in text-based form too, but from what you said, you're probably screwed anyway if your coworkers are just churning stuff and changing the values of fields and stuff randomly. They're going to change the meaning of JSON fields willy nilly too and there will be nothing to stop you from landing back in step 1.

I will say that the quality of gRPC integrations tends to vary wildly based on language though, which adds debt, you're definitely right about that.

andy_ppp 5 days ago | parent | next [-]

If I gave you a JSON object with name, age, position, gender etc. etc. would you not say it has structure? If I give you a GRPC binary you need the separate schema and tools to be able to comprehend it. That’s all I’m saying is the separation of the schema from some minimal structure makes the debugging of services more difficult. I would also add the GRPC implementation I used in Javascript (long ago) was not actually checking the types of the field in a lot of cases so rather than being a schema that rejects if some field is not a text field it would just return binary junk. JSON Schema or almost anything else will give you a parsing error instead.

Maybe the tools are fantastic not but I still think being able to debug messages without them is an advantage in almost all systems, you probably don’t need the level of performance GRPC provides.

If you’re using JSON Protobufs why would you add this extra complexity - it will mean messaging is just as slow as using JSON. What are the core advantages of GRPC under these conditions?

aseipp 5 days ago | parent [-]

> If I gave you a JSON object with name, age, position, gender etc. etc. would you not say it has structure?

That's too easy. What if I give you a 200KiB JSON object with 40+ nested fields that's whitespace stripped and has base64 encoded values? Its "structure" is a red herring. It is not a matter of text or binary. The net result is I still have to use a tool to inspect it, even if that's only something like gron/jq in order to make it actually human readable. But at the end of the day the structure is a concern of the application, I have to evaluate its structure in the context of that application. I don't just look at JSON objects for fun. I do it mostly to debug stuff. I still need the schematic structure of the object to even know what I need to write.

FWIW, I normally use something like grpcurl in order to do curl-like requests/responses to a gRPC endpoint and you can even have it give you the schema for a given service. This has worked quite well IME for almost all my needs, but I accept with this stuff you often have lots of "one-off" cases that you have to cobble stuff together or just get dirty with printf'ing somewhere inside your middleware, etc.

> I would also add the GRPC implementation I used in Javascript (long ago) was not actually checking the types of the field in a lot of cases so rather than being a schema that rejects if some field is not a text field it would just return binary junk. JSON Schema or almost anything else will give you a parsing error instead.

Yes, I totally am with you on this. Many of the implementations just totally suck and JSON is common enough nowadays that you kind of have to at least have something that doesn't completely fall over, if you want to be taken remotely seriously. It's hard to write a good JSON library, but it's definitely harder to write a good full gRPC stack. I 100% have your back on this. I would probably dislike gRPC even more but I'm lucky enough to use it with a "good" toolkit (Rust/Prost.)

> If you’re using JSON Protobufs why would you add this extra complexity - it will mean messaging is just as slow as using JSON. What are the core advantages of GRPC under these conditions?

I mean, if your entire complaint is about text vs binary, not efficiency or correctness, JSON Protobuf seems like it fits your needs. You still get the other benefits of gRPC you'd have anywhere (an honest-to-god schema, better transport efficiency over mandated HTTP/2, some amount of schema-generic middleware, first-class streaming, etc etc.)

FWIW, I don't particularly love gRPC. And while I admit I loathe JSON, I'm mainly pushing back on the notion that JSON has some "schema" or structure. No, it doesn't! Your application has and knows structure. A JSON object is just a big bag of stuff. For all its failings, gRPC having a schema is a matter of it actually putting the correct foot first and admitting that your schema is real, it exists, and most importantly can be written down precisely and checked by tools!

imtringued 5 days ago | parent | prev [-]

Here are some sad news for you: The flexibility of JSON and CBOR cannot be matched by any schema based system, because it is equivalent to giving up that advantage.

Sure, the removal of a field can cause an application level error, but that is probably the most benign form of failure there is. What's worse is when no error occurs and the data is simply reinterpreted to fit the schema. Then your database will slowly fill up with corrupted garbage data and you'll have to restore from a backup.

What you have essentially accomplished in your response is to miss the entire point.

There are also other problems with protobuf in the sense that the savings aren't actually as big as you'd expect. E.g. there is still costly parsing, the data transmitted over the wire isn't significantly smaller unless you have data that is a poor fit for JSON.

jonathanberi 5 days ago | parent | next [-]

It's also worth noting CDDL [1], which adds schema-like utility to CBOR (and technically JSON.) We've started to use it in more places where we use CBOR.

[1] https://datatracker.ietf.org/doc/rfc8610/

5 days ago | parent | prev [-]
[deleted]