Remix.run Logo
kstenerud 7 hours ago

This is great! The more human-readable, the better!

I've also been working in the other direction, making JSON more machine-readable:

https://github.com/kstenerud/bonjson/

It has EXACTLY the same capabilities and limitations as JSON, so it works as a drop-in replacement that's 35x faster for a machine to read and write.

No extra types. No extra features. Anything JSON can do, it can do. Anything JSON can't do, it can't do.

esrauch 5 hours ago | parent | next [-]

This is very interesting, though the limitations for 'security' reasons seem somewhat surprising to me compared to the claim "Anything JSON can do, it can do. Anything JSON can't do, it can't do.".

Simplest example, "a\u0000b" is a perfectly valid and in-bounds JSON string that valid JSON data sets may have in it. Doesn't it end up falling short of 'Anything JSON can do, it can do" to refuse to serialize that string?

kstenerud 2 hours ago | parent [-]

"a\u0000b" ("a" followed by a vertical tabulation control code) is also a perfectly valid and in-bounds BONJSON string. What BONJSON rejects is any invalid UTF-8 sequences, which shouldn't even be present in the data to begin with.

wizzwizz4 6 minutes ago | parent [-]

You're thinking of "a\u000b". "a\u0000b" is the three-character string also written "a\x00b".

kreco 6 hours ago | parent | prev | next [-]

Can you tell me what was the context that lead you to create this?

Unrelated JSON experience:

I worked on a serializer which save/load json files as well as binary file (using a common interface).

From my own use case I found JSON to be restrictive for no benefit (because I don't use it in a Javascript ecosystem)

So I change the json format into something way more lax (optional comma, optional colon, optional quotes, multi line string, comments).

I wish we would stop pretending JSON to be a good human-readable format outside of where it make sense and we would have a standard alternative for those non-json-centric case.

I know a lot of format already exists but none really took off so far.

Sardtok 2 hours ago | parent | next [-]

Have you heard of EDN? It's mostly used in Clojure and ClojureScript, as it is to Clojure what JSON is to JS.

If you need custom data types, you can use tagged elements, but that requires you to have functions registered to convert the data type to/from representable values (often strings).

It natively supports quite a bit more than JSON does, without writing custom data readers/writers.

https://github.com/edn-format/edn

kstenerud 6 hours ago | parent | prev [-]

Basically, for better or worse JSON is here to stay. It exists in all standard libraries. Swift's codec system revolves around it (it only handles types that are compatible with JSON).

It sucks, but we're stuck with JSON. So the idea here is to make it suck a little less by stopping all this insane text processing for data that never ever meets a human directly.

The progression I envisage is:

1. Dev reaches for JSON because it's easy and ubiquitous.

2. Dev switches to BONJSON because it's more efficient and requires no changes to their code other than changing the codec library.

3. Dev switches to a sane format after the complexity of their app reaches a certain level where a substantial code change is warranted.

kreco 5 hours ago | parent [-]

Thanks for the details!

eric-p7 3 hours ago | parent | prev | next [-]

Reminds me of Lite3 that was posted here not long ago:

https://github.com/fastserial/lite3

krick an hour ago | parent | prev | next [-]

What about compression rates?

zzo38computer an hour ago | parent | prev | next [-]

I think JSON is too limited and has some problems, so BONJSON has mostly the same problems. There are many other formats as well, some of which add additional types beyond JSON and some don't. Also, a few programs may expect (and possibly require) that files may contain invalid UTF-8, even though it is not proper JSON (I think it would be better that they should not use JSON, due to this and other issues), so there is that too. Using normalized Unicode has its own problems, as does allowing 64-bit integers when some programs expect it and others don't. JSON and Unicode are just not good formats, in general. (There is also a issue with JSON.stringify(-0) but that is an issue with JavaScript that does not seem to be relevant with BONJSON, as far as I can tell.)

Nevertheless, I believe your claims are mostly accurate, except for a few issues with which things are allowed or not allowed, due to JavaScript and other things (although in some of these cases, the BONJSON specification allows options to control this). Sometimes rejecting certain things is helpful, but not always; for example sometimes you do want to allow mismatched surrogates, and sometimes you might want to allow null characters. (The defaults are probably reasonable, but are often the result of a bad design anyways, as I had mentioned above.) Also, the top of the specification says it is safe against many attacks, but these are a feature of the implementation, which would also be the case if you are implement JSON or other formats (although the specification for BONJSON does specify that implementations are supposed to check for these things to make them safe).

(The issue of overlong UTF-8 encodings in IIS web servers is another security issue, which is using a different format for validation and for usage. In this case there are actually two usages though, because one of these usages is the handling of relative URLs (using the ASCII format) and the other is the handling of file names on the server (which might be using UTF-16 here; in addition to that is the internal format of the file paths into individual pieces with the internal handling of relative file paths). There are reasons to avoid and to check for overlong UTF-8 encodings, although this is a different more general issue than the character encoding.)

Another issue is canonical forms; the canonical form of JSON can be messy, especially for numbers (I don't know what the canonical form for numbers in JSON is, but I read that apparently it is complicated).

I think DER is better. BONJSON is more compact but that also makes the framing more complicated to handle than DER (which uses consistent framing for all types). I also wrote a program to convert JSON to DER (I also made up some nonstandard types, although the conversion from JSON to DER only uses one of these nonstandard types (key/value list); the other types it needs are standard ASN.1 types). Furthermore, DER is already canonical form (and I had made up SDER and SDSER for when you do not want canonical form but also do not want the messiness of BER; SDSER does have chunking and does not require the length to be known ahead of time, so more like BONJSON in these ways). Because of the consistent framing, you can easily ignore any types that you do not use; even though there are many types you do not necessarily need all of them.

imiric 7 hours ago | parent | prev [-]

That's neat, but I'm much more intrigued by your Concise Encoding project[1]. I see that it only has a single Go reference implementation that hasn't been updated in 3 years. Is the project still relevant?

Thanks for sharing your work!

[1]: https://concise-encoding.org/

kstenerud 6 hours ago | parent [-]

Thanks!

I'm actually having second thoughts with Concise Encoding. It's gotten very big with all the features it has, which makes it less likely to be adopted (people don't like new things).

I've been toying around with a less ambitious format called ORB: https://github.com/kstenerud/orb

It's essentially an extension of BONJSON (so it can read BONJSON documents natively) that adds extra types and features.

I'm still trying to decide what types will actually be of use in the real world... CE's graph type is cool, but if nobody uses it...

zzo38computer an hour ago | parent [-]

I use ASN.1X, so I use some types that those other formats do not have. Some of the types of ASN.1 are: unordered set, ISO 2022 string, object identifier, bit string. I added some additional types into ASN.1X, such as: TRON string, rational numbers, key/value list (with any types for keys and for values (and the types of keys do not necessarily have to match); for one thing, keys do not have to be Unicode), and reference to other nodes. However, ASN.1 (and ASN.1X) does not distinguish between qNaN and sNaN. I had also made up TER, which is a text format that can be converted to DER (like how ORT can be converted to ORB, although its working is differently, and is not compatible with JSON (TER somewhat resembles PostScript)).

Your extensions of JSON with comments, hexadecimal notation, optional commas, etc is useful though (my own program to convert JSON to DER does treat commas as spaces, although that is an implementation detail).