Remix.run Logo
eadmund 2 days ago

> So, yes, instead of saying that "e" equals "65537", you're saying that "e" equals "AQAB". Aren't you glad you did those extra steps?

Oh JSON.

For those unfamiliar with the reason here, it’s that JSON parsers cannot be relied upon to treat numbers properly. Is 4723476276172647362476274672164762476438 a valid JSON number? Yes, of course it is. What will a JSON parser due with it? Silently truncate it to a 64-bit or 63-bit integer, or a float, probably or if you’re very lucky emit an error (a good JSON decoder written in a sane language like Common Lisp would of course just return the number, but few of us are so lucky).

So the only way to reliably get large integers into and out of JSON is to encode them as something else. Base64-encoded big-endian bytes is not a terrible choice. Silently doing the wrong thing is the root of many security errors, so it not wrong to treat every number in the protocol this way. Of course, then one loses the readability of JSON.

JSON is better than XML, but it really isn’t great. Canonical S-expressions would have been far preferable, but for whatever reason the world didn’t go that way.

cortesoft 2 days ago | parent | next [-]

> Canonical S-expressions would have been far preferable, but for whatever reason the world didn’t go that way.

I feel like not understanding why JSON won out is being intentionally obtuse. JSON can easily be hand written, edited, and read for most data. Canonical S-expressions are not as easy to read and much harder to write by hand; having to prefix every atom with a length makes is very tedious to write by hand. If you have a JSON object you want to hand edit, you can just type... for an Canonical S-expression, you have to count how many characters you are typing/deleting, and then update the prefix.

You might not think the ability to hand generate, read, and edit is important, but I am pretty sure that is a big reason JSON has won in the end.

Oh, and the Ruby JSON parser handles that large number just fine.

motorest 19 hours ago | parent | next [-]

> I feel like not understanding why JSON won out is being intentionally obtuse. JSON can easily be hand written, edited, and read for most data.

You are going way out of your way to try to come up with ways to rationalize why JSON was a success. The ugly truth is far simpler than what you're trying to sell: it was valid JavaScript. JavaScript WebApps could parse JSON with a call to eval(). No deserialization madness like XML, no need to import a parser. Just fetch a file, pass it to eval(), and you're done.

nextaccountic 8 hours ago | parent | next [-]

In other words, the thing that made JSON initially succeed was also a giant security hole

motorest 6 hours ago | parent [-]

> In other words, the thing that made JSON initially succeed was also a giant security hole

Perhaps, but it's not a major concern when you control both the JavaScript frontend and whatever backend it consumes. In fact, arguably this technique is still pretty much in use today with the way WebApps get a hold of CSRF tokens. In this scenario security is a lesser concern than, say, input validation.

jaapz 14 hours ago | parent | prev | next [-]

But also, all the other reasons written by the person you replied to

motorest 6 hours ago | parent [-]

> But also, all the other reasons written by the person you replied to

Not really. JSON's mass adoption is tied to JavaScript's mass adoption, where sheer convenience and practicality dictates it's whole history most of the current state. Sending JavaScript fragments from the backend is a technique that didn't really stopped being used just because someone rolled out a JSON parser.

I think some people feel compelled to retroactively make this whole thing more refined and elegant because for some the ugly truth is hard to swallow.

amne 17 hours ago | parent | prev [-]

it's in the name after all: [j]ava[s]cript [o]bject [n]otation

eadmund 2 days ago | parent | prev | next [-]

> I feel like not understanding why JSON won out is being intentionally obtuse.

I didn’t feel like my comment was the right place to shill for an alternative, but rather to complain about JSON. But since you raise it.

> JSON can easily be hand written, edited, and read for most data.

So can canonical S-expressions!

> Canonical S-expressions are not as easy to read and much harder to write by hand; having to prefix every atom with a length makes is very tedious to write by hand.

Which is why the advanced representation exists. I contend that this:

    (urn:ietf:params:acme:error:malformed
     (detail "Some of the identifiers requested were rejected")
     (subproblems ((urn:ietf:params:acme:error:malformed
                    (detail "Invalid underscore in DNS name \"_example.org\"")
                    (identifier (dns _example.org)))
                   (urn:ietf:params:acme:error:rejectedIdentifier
                    (detail "This CA will not issue for \"example.net\"")
                    (identifier (dns example.net))))))
is far easier to read than this (the first JSON in RFC 8555):

    {
        "type": "urn:ietf:params:acme:error:malformed",
        "detail": "Some of the identifiers requested were rejected",
        "subproblems": [
            {
                "type": "urn:ietf:params:acme:error:malformed",
                "detail": "Invalid underscore in DNS name \"_example.org\"",
                "identifier": {
                    "type": "dns",
                    "value": "_example.org"
                }
            },
            {
                "type": "urn:ietf:params:acme:error:rejectedIdentifier",
                "detail": "This CA will not issue for \"example.net\"",
                "identifier": {
                    "type": "dns",
                    "value": "example.net"
                }
            }
        ]
    }
> for an Canonical S-expression, you have to count how many characters you are typing/deleting, and then update the prefix.

As you can see, no you do not.

thayne a day ago | parent | next [-]

Your example uses s-expressions, not canonical s-expressions. Canonical s expressions[1] is basically a binary format. Each atom/string is prefixed by a decimal length of the string and a colon. It's advantage over regular s expressions is that there is no need to escape or quote strings with whitespace, and there is only a single possible representation for a given data structure. The disadvantage is it is much harder to write and read by humans.

As for s-expressions vs json, there are pros and cons to each. S-expressions don't have any way to encode type information in the data itself, you need a schema to know if a certain value should be treated as a number or a string. And it's subjective which is more readable.

[1]: https://en.m.wikipedia.org/wiki/Canonical_S-expressions

eadmund a day ago | parent | next [-]

> Your example uses s-expressions, not canonical s-expressions.

I’ve always used ‘canonical S-expressions’ to refer to Rivest’s S-expressions proposal: https://www.ietf.org/archive/id/draft-rivest-sexp-13.html, a proposal which has canonical, basic transport & advanced transport representations which are all equivalent to one another (i.e., every advanced transport representation has a single canonical representation). I don’t know where I first saw it, but perhaps it was intended to distinguish from other S-expressions such as Lisp’s or Scheme’s?

Maybe I should refer to them as ‘Rivest S-expressions’ or ‘SPKI S-expressions’ instead.

> S-expressions don't have any way to encode type information in the data itself, you need a schema to know if a certain value should be treated as a number or a string.

Neither does JSON, as this whole thread indicates. This applies to other data types, too: while a Rivest expression could be

    (date [iso8601]2025-05-24T12:37:21Z)
JSON is stuck with:

    {
      "date": "2025-05-24T12:37:21Z"
    }
> And it's subjective which is more readable.

I really disagree. The whole reason YAML exists is to make JSON more readable. Within limits, the more data one can have in a screenful of text, the better. JSON is so terribly verbose if pretty-printed that it takes up screens and screens of text to represent a small amount of data — and when not pretty-printed, it is close to trying to read a memory trace.

Edit: updated link to the January 2025 proposal.

antonvs a day ago | parent [-]

That Rivest draft defines canonical S-expressions to be the format in which every token is preceded by its length, so it's confusing to use "canonical" to describe the whole proposal, or use it as a synonym for the "advanced" S-expressions that the draft describes.

But that perhaps hints at some reasons that formats like JSON tend to win popularity contests over formats like Rivest's. JSON is a single format for authoring and reading, which doesn't address transport at all. The name is short, pronounceable (vs. "spikky" perhaps?), and clearly refers to one thing - there's no ambiguity about whether you might be talking about a transport encoding instead,

I'm not saying these are good reasons to adopt JSON over SPKI, just that there's a level of ambition in Rivest's proposal which is a poor match for how adoption tends to work in the real world.

There are several mechanism for JSON transport encoding - including plain old gzip, but also more specific formats like MessagePack. There isn't one single standard for it, but as it turns out that really isn't that important.

Arguably there's a kind of violation of separation of concerns happening in a proposal that tries to define all these things at once: "a canonical form ... two transport representations, and ... an advanced format".

wat10000 20 hours ago | parent | next [-]

JSON also had the major advantage of having an enormous ecosystem from day 1. It was ugly and kind of insecure, but the fact that every JavaScript implementation could already parse and emit JSON out of the box was a huge boost. It’s hard to beat that even if you have the best format in the world.

antonvs 18 hours ago | parent [-]

Haha yes, that does probably dwarf any other factors.

But still, I think if the original JSON spec had been longer and more comprehensive, along the lines of Rivest's, that could have limited JSON's popularity, or resulted in people just ignoring parts of it and focusing on the parts they found useful.

The original JSON RFC-4627 was about 1/3rd the size of the original Rivest draft (a body of 260 lines vs. 750); it defines a single representation instead of four; and e.g. the section on "Encoding" is just 3 sentences. Here it is, for reference: https://www.ietf.org/rfc/rfc4627.txt

wat10000 18 hours ago | parent [-]

We already see that a little bit. JSON in theory allows arbitrary decimal numbers, but in practice it’s almost always limited to numbers that are representable as an IEEE-754 double. It used to allow UTF-16 and UTF-32, but in practice only UTF-8 was widely accepted, and that eventually got reflected in the spec.

I’m sure you’re right. If even this simple spec exceeded what people would actually use as a real standard, surely anything beyond that would also be left by the wayside.

kevin_thibedeau 21 hours ago | parent | prev [-]

> clearly refers to one thing

Great, this looks like JSON. Is it JSON5? Does it expect bigint support? Can I use escape chars?

antonvs 19 hours ago | parent [-]

You're providing an example of my point. People don't, in general, care about any of that, so "solving" those "problems" isn't likely to help adoption.

To your specific points:

1. JSON5 didn't exist when JSON adoption occurred, and in any case they're pretty easy to tell apart, because JSON requires keys to be quoted. This is a non-problem. Why do you think it might matter? Not to mention that the existence of some other format that resembles JSON is hardly a reflection on JSON itself, except perhaps as a compliment to its perceived usefulness.

2. Bigint support is not a requirement that most people have. It makes no difference to adoption.

3. Escape character handling is pretty well defined in ECMA 404. Your point is so obscure I don't even know specifically what you might be referring to.

thayne 16 hours ago | parent [-]

I agree with most of what you said, but json's numbers are problematic. For one thing, many languages have 64-bit integers, which can't be precisely represented as a double, so serializing such a value can lead to subtle bug if it is deserialized by a parser that only supports doubles. And deserializing in languages that have multiple numeric types is complicated, since the parser often doesn't have enough context to know what the best numeric type to use is.

dietr1ch a day ago | parent | prev [-]

The length thing sounds like an editor problem, but we have wasted too much time in coming up with syntax that pleases personal preferences without admitting we would be better off moving away from text.

927 can be avoided, but it's way harder than it seems, which is why we have the proliferation of standards that fail to become universal.

eximius 2 days ago | parent | prev | next [-]

For you, perhaps. For me, the former is denser, but crossing into a "too dense" region. The JSON has indentation which is easy on my poor brain. Also, it's nice to differentiate between lists and objects.

But, I mean, they're basically isomorphic with like 2 things exchanges ({} and [] instead of (); implicit vs explicit keys/types).

josephg a day ago | parent [-]

Yeah. I don’t even blame S-expressions. I think I’ve just been exposed to so much json at this point that my visual system has its own crappy json parser for pretty-printed json.

S expressions may well be better. But I don’t think S expressions are better enough to be able to overcome json’s inertia.

eddythompson80 a day ago | parent | prev | next [-]

> is far easier to read than this (the first JSON in RFC 8555):

It's not for me. I'd literally take anything over csexps. Like there is nothing that I'd prefer it to. If it's the only format around, then I'll just roll my own.

justinclift 21 hours ago | parent [-]

> Like there is nothing that I'd prefer it to.

May I suggest perl regex's? :)

remram 20 hours ago | parent | prev | next [-]

This doesn't help with numbers at all, though. Any textual representation of numbers is going to have the same problem as JSON.

NooneAtAll3 18 hours ago | parent | prev | next [-]

> I contend that this is far easier to read than this

oh boi, that's some Lisp-like vs C-like level of holywar you just uncovered there

and wooow my opinion is opposite of yours

michaelcampbell 21 hours ago | parent | prev [-]

> is far easier to read than this

Readability is a function of the reader, not the medium.

lisper 18 hours ago | parent | prev | next [-]

> Canonical S-expressions are not as easy to read and much harder to write by hand

You don't do that, any more than you read or write machine code in binary. You read and write regular S-expressions (or assembly code) and you translate that into and out of canonical S expressions (or machine code) with a tool (an assembler/disassembler).

cortesoft 16 hours ago | parent [-]

I have written by hand and read JSON hundreds of times. You can tell me I shouldn’t, but I am telling you I do. Messing around with an API with curl, tweaking a request object slightly for testing something, etc.

Reading happens even more times. I am constantly printing out API responses when I am coding, verifying what I am seeing matches what I am expecting, or trying to get an idea of the structure of something. Sure, you can tell me I shouldn’t do this and I should just read a spec, but in my experience it is often much faster just to read the JSON directly. Sometimes the spec is outdated, just plain wrong, or doesn’t exist. Being able to read the JSON is a regular part of my day.

lisper 16 hours ago | parent [-]

I think there may be a terminological disconnect here. S-expressions and canonical S-expressions are not the same thing. S-expressions (non-canonical) are a comparable to JSON, intended to be read and written by humans, and actually much easier to read and write than JSON because it uses less punctuation.

https://en.wikipedia.org/wiki/S-expression

A canonical S-expression is a binary format, intended to be both generated and parsed by machines, not humans:

https://en.wikipedia.org/wiki/Canonical_S-expressions

pharrington a day ago | parent | prev | next [-]

The entire reason ACME exists is because you are never writing or reading the CSR by hand.

So of course, ACME is based around a format whose entire reason d'etre is being written and read by hand.

It's weird.

thayne a day ago | parent [-]

The reason json is a good format for ACME isn't that it is easy to read and write by hand[1], but that most languages have at least one decent json implementation available, so it is easier to implement clients in many different languages.

[1]: although being easy to read by humans is an advantage when debugging why something isn't working.

beeflet 18 hours ago | parent | prev [-]

you can use a program to convert between s-expressions and a more readable format. In a world where canonical s-expressions rule, this "more readable format" would probably be an ordinary s-expression

tsimionescu a day ago | parent | prev | next [-]

This seems like a just-so story. Your explanation could make some sense if we were comparing {"e" : "AQAB"} to {"e" : 65537}, but there is no reason why that should be the alternative. The JSON {"e" : "65537"} will be read precisely the same way by any JSON parser out there. Converting the string "65537" to the number 65537 is exactly as easy (or hard), but certainly unambiguous, as converting the string "AQAB" to the same number.

Of course, if you're doing this in JS and have reasons to think the resulting number may be larger than the precision of a double, you have a huge problem either way. Just as you would if you were writing this in C and thought the number may be larger than what can fit in a long long. But that's true regardless of how you represent it in JSON.

pornel a day ago | parent | next [-]

For very big numbers (that could appear in these fields), generating and parsing a base 10 decimal representation is way more cumbersome than using their binary representation.

The DER encoding used in the TLS certificates uses the big endian binary format. OpenSSL API wants the big endian binary too.

The format used by this protocol is a simple one.

It's almost exactly the format that is needed to use these numbers, except JSON can't store binary data directly. Converting binary to base 64 is a simple operation (just bit twiddling, no division), and it's easier than converting arbitrarily large numbers between base 2 and base 10. The 17-bit value happens to be an easy one, but other values may need thousands of bits.

It would be silly for the sender and recipient to need to use a BigNum library when the sender has the bytes and the recipient wants the bytes, and neither has use for a decimal number.

deepsun a day ago | parent | prev [-]

Some parsers, like PHP, may treat 65537 and "65537" the same. Room for vulnerability.

int_19h a day ago | parent [-]

Why would they do so? It's semantically distinct JSON, even JS itself treats it differently?

dwattttt a day ago | parent | next [-]

Time for a trip to the Abbey of Hidden Absurdities.

http://www.thecodelesscode.com/case/161

hiciu a day ago | parent | prev [-]

It's PHP. Handling numbers in PHP is complicated enough that a reasonable person would not trust it by default.

https://www.php.net/manual/en/language.types.numeric-strings...

int_19h 5 hours ago | parent [-]

I know that PHP will treat a string as if it were a number if you try to use it in a context where number is expected; JS does the same thing. But why would that affect JSON deserialization in a way that makes numbers and strings indistinguishable in principle (causing the loss of precision as described here)?

ncruces 2 days ago | parent | prev | next [-]

Go can decode numbers losslessly as strings: https://pkg.go.dev/encoding/json#Number

json.Number is (almost) my “favorite” arbitrary decimal: https://github.com/ncruces/decimal?tab=readme-ov-file#decima...

I'm half joking, but I'm not sure why S-expressions would be better here. There are LISPs that don't do arbitrary precision math.

eadmund a day ago | parent | next [-]

> Go can decode numbers losslessly as strings: https://pkg.go.dev/encoding/json#Number

Yup, and if you’re using JSON in Go you really do need to be using Number exclusively. Anything else will lead to pain.

> I'm half joking, but I'm not sure why S-expressions would be better here. There are LISPs that don't do arbitrary precision math.

Sure, but I’m referring specifically to https://www.ietf.org/archive/id/draft-rivest-sexp-13.html, which only has lists and bytes, and so number are always just strings and it’s up to the program to interpret them.

mise_en_place 2 days ago | parent | prev [-]

For actual SERDES, JSON becomes very brittle. It's better to use something like protobuf or cap'n'proto for such cases.

josephg a day ago | parent | prev | next [-]

The funny thing about this is that JavaScript the language has had support for BigIntegers for many years at this point. You can just write 123n for a bigint of 123.

JSON could easily be extended to support them - but there’s no standards body with the authority to make a change like that. So we’re probably stuck with json as-is forever. I really hope something better comes along that we can all agree on before I die of old age.

While we’re at it, I’d also love a way to embed binary data in json. And a canonical way to represent dates. And comments. And I’d like a sane, consistent way to express sum types. And sets and maps (with non string keys) - which JavaScript also natively supports. Sigh.

aapoalas 20 hours ago | parent | next [-]

It's more a problem of support and backwards compatibility. JSON and parsers for it are so ubiquitous, and the spec completely lacks any versioning support, that adding a feature would be a breaking change of horrible magnitude, on nearly all levels of the modern software infrastructure stack. I wouldn't be surprised if some CPUs might break from that :D

JSON is a victim of its success: it has become too big to fail, and too big to improve.

Sammi 16 hours ago | parent | prev [-]

There are easy workarounds to getting bigints in JSON: https://github.com/GoogleChromeLabs/jsbi/issues/30#issuecomm...

josephg an hour ago | parent [-]

Sure; and I can encode maps and sets as entry lists. Binary data as strings and so on. But I don’t want to. I shouldn’t have to.

The fact remains that json doesn’t have native support for any of this stuff. I want something json-like which supports all this stuff natively. I don’t want to have to figure out if some binary data is base64 encoded or hex encoded or whatever, and hack around jackson or serde or javascript to encode and decode my objects properly. Features like this should be built in.

Sammi 27 minutes ago | parent [-]

Agree. JSON definitely needs an update so we can get better ergonomics built in.

In code you control you can choose to use JSON5: https://json5.org/

marcosdumay 2 days ago | parent | prev | next [-]

What I don't understand is why you (and a lot of other people) just expect S-expression parsers to not have the exact same problems.

eadmund 2 days ago | parent | next [-]

Because canonical S-expressions don’t have numbers, just atoms (i.e., byte sequences) and lists. It is up to the using code to interpret "34" as the string "34" or the number 34 or the number 13,108 or the number 13,363, which is part of the protocol being used. In most instances, the byte sequence is probably a decimal number.

Now, S-expressions as used for programming languages such as Lisp do have numbers, but again Lisp has bignums. As for parsers of Lisp S-expressions written in other languages: if they want to comply with the standard, they need to support bignums.

tsimionescu a day ago | parent | next [-]

You can write JSON that exclusively uses strings, so this is not really relevant. Sure, maybe it can be considered an advantage that s-expressions force you to do that, though it can also be seen just as easily as a disadvantage. It certainly hurts readability of the format, which is not a 0-cost thing. This is also why all Lisps use more than plain sexps to represent their code: having different syntax for different types helps.

its-summertime 2 days ago | parent | prev | next [-]

"it can do one of 4 things" sounds very much like the pre-existing issue with JSON

motorest 18 hours ago | parent | prev [-]

> Because canonical S-expressions don’t have numbers, just atoms (i.e., byte sequences) and lists.

If types other than string and a list bother you, why don't you stick with those types in JSON?

2 days ago | parent | prev | next [-]
[deleted]
01HNNWZ0MV43FF 2 days ago | parent | prev [-]

I think they mean that Common Lisp has bigints by default

ryukafalz 2 days ago | parent [-]

As do Scheme and most other Lisps I'm familiar with, and integers/floats are typically specified to be distinct. I think we'd all be better off if that were true of JSON as well.

I'd be happy to use s-expressions instead :) Though to GP's point, I suppose we might then end up with JS s-expression parsers that still treat ints and floats interchangeably.

petre a day ago | parent [-]

And in addition to that are unable to distingush between a string "42" and a number 42.

kangalioo 2 days ago | parent | prev | next [-]

But what's wrong with sending the number as a string? `"65537"` instead of `"AQAB"`

comex 2 days ago | parent | next [-]

The question is how best to send the modulus, which is a much larger integer. For the reasons below, I'd argue that base64 is better. And if you're sending the modulus in base64, you may as well use the same approach for the exponent sent along with it.

For RSA-4096, the modulus is 4096 bits = 512 bytes in binary, which (for my test key) is 684 characters in base64 or 1233 characters in decimal. So the base64 version is much smaller.

Base64 is also more efficient to deal with. An RSA implementation will typically work with the numbers in binary form, so for the base64 encoding you just need to convert the bytes, which is a simple O(n) transformation. Converting the number between binary and decimal, on the other hand, is O(n^2) if done naively, or O(some complicated expression bigger than n log n) if done optimally.

Besides computational complexity, there's also implementation complexity. Base conversion is an algorithm that you normally don't have to implement as part of an RSA implementation. You might argue that it's not hard to find some library to do base conversion for you. Some programming languages even have built-in bigint types. But you typically want to avoid using general-purpose bigint implementations for cryptography. You want to stick to cryptographic libraries, which typically aim to make all operations constant-time to avoid timing side channels. Indeed, the apparent ease-of-use of decimal would arguably be a bad thing since it would encourage implementors to just use a standard bigint type to carry the values around.

You could argue that the same concern applies to base64, but it should be relatively safe to use a naive implementation of base64, since it's going to be a straightforward linear scan over the bytes with less room for timing side channels (though not none).

nssnsjsjsjs a day ago | parent [-]

Ah OK so: readable, efficient, consistent; pick 2.

shiandow a day ago | parent | prev | next [-]

Converting large integers to decimal is nontrivial, especially when you don't trust languages to handle large numbers.

Why you wouldn't just use the hexadecimal that everyone else seems to use I don't know. There seems to be a rather arbitrary cutoff where people prefer base64 to hexadecimal.

red_admiral a day ago | parent | prev | next [-]

This sounds like an XY problem to me. There is already an alternative that is at least as secure and only requires a single base-64 string: Ed25519.

deepsun a day ago | parent | prev | next [-]

PHP (at least old versions I worked with) treats "65537" and 65537 similarly.

red_admiral a day ago | parent [-]

That sounds horrible if you want to transmit a base64 string where the length is a multiple of 3 and for some cursed reason there's no letters or special characters involved. If "7777777777777777" is your encoded string because you're sending a string of periods encoded in BCD, you're going to have a fun time. Perhaps that's karma for doing something braindead in the first place though.

foobiekr 2 days ago | parent | prev | next [-]

Cost.

ayende 2 days ago | parent | prev [-]

Too likely that this would not work because silent conversion to number along the way

iforgotpassword 2 days ago | parent [-]

Then just prefixing it with an underscore or any random letter would've been fine but of course base64 encoding the binary representation in base 64 makes you look so much smarter.

JackSlateur 2 days ago | parent | prev | next [-]

Is this ok ?

  Python 3.13.3 (main, May 21 2025, 07:49:52) [GCC 14.2.0] on linux
  Type "help", "copyright", "credits" or "license" for more 
 information.
  >>> import json
  >>>
 
 json.loads('47234762761726473624762746721647624764380000000000000000000000000000000000000000000')
 47234762761726473624762746721647624764380000000000000000000000000000000000000000000
sevensor a day ago | parent | next [-]

Just cross your fingers and hope for the best if your data is at any point decoded by a json library that doesn’t support bigints? Python’s ability to handle them is beside the point of they get mangled into ieee754 doubles along the way.

teddyh 2 days ago | parent | prev | next [-]

I prefer

  >> import json, decimal
  >> j = "47234762761726473624762746721647624764380000000000000000000000000000000000000000000"
  >> json.loads(j, parse_float=decimal.Decimal, parse_int=decimal.Decimal)
  Decimal('47234762761726473624762746721647624764380000000000000000000000000000000000000000000')
This way you avoid this problem:

  >> import json
  >> j = "0.47234762761726473624762746721647624764380000000000000000000000000000000000000000000"
  >> json.loads(j)
  0.47234762761726473
And instead can get:

  >> import json, decimal
  >> j = "0.47234762761726473624762746721647624764380000000000000000000000000000000000000000000"
  >> json.loads(j, parse_float=decimal.Decimal, parse_int=decimal.Decimal)
  Decimal('0.47234762761726473624762746721647624764380000000000000000000000000000000000000000000')
jazzyjackson 2 days ago | parent | prev [-]

yes, python falls into the sane language category with arbitrary-precision arithmetic

faresahmed a day ago | parent [-]

Not so much,

    >>> s="1"+"0"*4300
    >>> json.loads(s)
    ...
    ValueError: Exceeds the limit (4300 digits) for integer string conversion: 
    value has 4301 digits; use sys.set_int_max_str_digits() to increase the limit
This was done to prevent DoS attacks 3 years ago and have been backported to at least CPython 3.9 as it was considered a CVE.

Relevant discussion: https://news.ycombinator.com/item?id=32753235

Your sibling comment suggests using decimal.Decimal which handles parsing >4300 digit numbers (by default).

lifthrasiir a day ago | parent [-]

This should be interpreted as a stop-gap measure before a subquadratic algorithm can be adopted. Take a look at _pylong.py in new enough CPython.

zubspace a day ago | parent | prev | next [-]

Wouldn't it just solve a whole lot of problems if we could just add optional type declarations to json? It seems so simple and obvious that I'm kinda dumbfounded that this is not a thing yet. Most of the time you would not need it, but it would prevent the parser from making a wrong guess in all those edge cases.

Probably there are types not every parser/language can accept, but at least it could throw a meaningful error instead of guessing or even truncating the value.

ivanbakel a day ago | parent | next [-]

I doubt that would fix the issue. The real cause is that programmers mostly deal in fixed-size integers, and that’s how they think of integer values, since those are the concepts their languages provide. If you’re going to write a JSON library for your favourite programming language, you’re going to reach for whatever ints are the default, regardless of what the specs or type hints suggest.

Haskell’s Aeson library is one of the few exceptions I’ve seen, since it only parses numbers to ‘Scientific’s (essentially a kind of bigint for rationals.) This makes the API very safe, but also incredibly annoying to use if you want to just munge some integers, since you’re forced to handle the error case of the unbounded values not fitting in your fixed-size integer values.

Most programmers likely simply either don’t consider that case, or don’t want to have to deal with it, so bad JSON libraries are the default.

movpasd a day ago | parent | prev [-]

This is actually a deliberate design choice, which the breathtakingly short JSON standard explains quite well [0]. The designers deliberately didn't introduce any semantics and pushes all that to the implementors. I think this is a defensible design goal. If you introduce semantics, you're sure to annoy someone.

There's an element of "worse is better" here [1]. JSON overtook XML exactly because it's so simple and solves for the social element of communication between disparate projects with wildly different philosophies, like UNIX byte-oriented I/O streams, or like the C calling conventions.

---

[0] https://ecma-international.org/publications-and-standards/st...

[1] https://en.wikipedia.org/wiki/Worse_is_better

tempodox 2 days ago | parent | prev | next [-]

“Worse is better” is still having ravaging success.

drob518 2 days ago | parent | prev | next [-]

Seems like a large integer can always be communicated as a vector of byte values in some specific endian order, which is easier to deal with than Base64 since a JSON parser will at least convert the byte value from text to binary for you.

But yea, as a Clojure guy sexprs or EDN would be much better.

supermatt a day ago | parent | prev | next [-]

As you said - it’s not really a problem with the JSON structure and format itself, but the underlying parser, which is specifically designed to map to the initial js types. There are parsers that don’t have this problem, but then the JSON itself is not portable.

The problem with your solution is that it’s also not portable for the same reason (it’s not part of the standard), and the reason that it wasn’t done that way in the first place is because it wouldn’t map to those initial js types!

FYI, you can easily work around this by using replacer and revivers that are part of the standards for stringify and parse and treat numbers differently. But again, the json isn’t portable to places without those replacer/revivers.

I.e, the real problem is treating something that looks like json as json by using standards compliant json parsers - not the apparent structure of the format itself. You could fix this problem in an instant by calling it something other than JSON, but people will see it and still use a JSON parser because it looks like JSON, not because it is JSON.

zelphirkalt a day ago | parent [-]

Isn't the actual problem that it is supposed to map to JS types, which are badly designed, and thus being infectious for other ecosystems, that don't have these defects?

mnahkies a day ago | parent | prev | next [-]

I'm still haunted by a bug caused by the JSON serializer our C# apps were using emitting bigints as JSON numbers, only for the JavaScript consumers to mangle them silently.

Kinda blows my mind that the accepted behavior is to just overflow and not raise an exception.

I try to stick to strings for anything that's not a 32 bit int now.

em-bee 2 days ago | parent | prev | next [-]

as someone who started the s-expression task on rosettacode.org, i approve. if you need an s-expression parser for your language, look here https://rosettacode.miraheze.org/wiki/S-expressions (the canonical url is https://rosettacode.org/wiki/S-expressions but they have DNS issues right now)

fulafel a day ago | parent | prev | next [-]

Is the correct number implementation really the exception? The first 2 json decoders I just tried (Python & Clojure) worked correctly with that example.

mindcrime 2 days ago | parent | prev | next [-]

  JSON is better than XML, but it really isn’t great. 
JSON doesn't even support comments, c'mon. I mean, it's handy for some things, but I don't know if I'd say "JSON is better than XML" in any universal sense. I still go by the old saw "use the right tool for the job at hand". In some cases maybe it's JSON. In others XML. In others S-Exprs encoded in EBCDIC or something. Whatever works...
deepsun 17 hours ago | parent [-]

Yup, imagine if HTML was JSON-like, not XML-like.

rendaw a day ago | parent | prev | next [-]

Canonical S-expressions don't have an object/mapping type, which means you can't have generic tooling unambiguously perform certain common operations like data merges.

ownedthx 21 hours ago | parent | prev | next [-]

The numerical issues here are due to JavaScript, not JSON.

TZubiri 2 days ago | parent | prev | next [-]

It feels like malpractice to use json in encryption

red_admiral a day ago | parent [-]

Sadly JWT and friends are "standard". In theory the representation and the data are independent and you can marshal and unmarshal correctly.

In practice, "alg:none" is a headache and everyone involved should be ashamed.

matja 2 days ago | parent | prev | next [-]

Aren't JSON parsers technically not following the standard if they don't reliably store a number that is not representable by a IEEE754 double precision float?

It's a shame JSON parsers usually default to performance rather than correctness, by using bignums for numbers.

q3k 2 days ago | parent | next [-]

Have a read through RFC7159 or 8259 and despair.

> This specification allows implementations to set limits on the range and precision of numbers accepted

JSON is a terrible interoperability standard.

matja 2 days ago | parent [-]

So a JSON parser that cannot store a 2 is technically compliant? :(

reichstein 2 days ago | parent | next [-]

JSON is a text format. A parser must recognize the text `2` as a valid production of the JSON number grammar.

Converting that text to _any_ kind of numerical value is outside the scope of the specification. (At least the JSON.org specification, the RFC tries to say more.)

As a textural format, when you use it for data interchange between different platforms, you should ensure that the endpoints agree on the _interpretation_, otherwise they won't see the same data.

Again outside of the scope of the JSON specification.

deepsun 16 hours ago | parent | next [-]

The more a format restricts, the more useful it is. E.g. if a format allows pretty much anything and it's up to parsers to accept or reject it, we may as well say "any text file" (or even "any data file") -- it would allow for anything.

Similarly to a "schema-less" DBMS -- you will still have a schema, it will just be in your application code, not enforced by the DBMS.

JSON is a nice balance between convenience and restrictions, but it's still a compromise.

tsimionescu a day ago | parent | prev [-]

A JSON parser has to check if a numeric value is actually numeric - the JSON {"a" : 123456789} is valid, but {"a" : 12345678f} is not. Per the RFC, a standards-compliant JSON parser can also refuse {"a": 123456789} if it considers the number is too large.

q3k 2 days ago | parent | prev [-]

Yep. Or one that parses it into a 7 :)

chasd00 2 days ago | parent | next [-]

> Or one that parses it into a 7 :)

if it's known and acceptable that LLMs can hallucinate arguments to an API then i don't see how this isn't perfectly acceptable behavior either.

kevingadd 2 days ago | parent | prev [-]

I once debugged a production issue that boiled down to "A PCI compliance .dll was messing with floating point flags, causing the number 4 to unserialize as 12"

xeromal 2 days ago | parent [-]

That sounds awful. lol

kens 2 days ago | parent | prev [-]

> Aren't JSON parsers technically not following the standard if they don't reliably store a number that is not representable by a IEEE754 double precision float?

That sentence has four negations and I honestly can't figure out what it means.

alterom a day ago | parent | next [-]

>> Aren't JSON parsers technically not following the standard if they don't reliably store a number that is not representable by a IEEE754 double precision float?

>That sentence has four negations and I honestly can't figure out what it means.

This example is halfway as bad as the one Orwell gives in my favorite essay, "Politics the the English Language"¹.

Compare and contrast:

>I am not, indeed, sure whether it is not true to say that the Milton who once seemed not unlike a seventeenth-century Shelley had not become, out of an experience ever more bitter in each year, more alien (sic) to the founder of that Jesuit sect which nothing could induce him to tolerate.

Orwell has much to say about either.

_____

¹https://www.orwellfoundation.com/the-orwell-foundation/orwel...

NooneAtAll3 18 hours ago | parent [-]

that Orwell quote can be saved a lot by proper punctuation

I am not, indeed, sure*,* whether it is not true to say that the Milton *(*who once seemed not unlike a seventeenth-century Shelley*)* had not become *-* out of an experience *-* ever more bitter in each year, more alien (sic) to the founder of that Jesuit sect*,* which nothing could induce him to tolerate.

NooneAtAll3 18 hours ago | parent | prev | next [-]

Aren't {X}? -> isn't it true that {X}?

{X} = JSON parsers technically [are] not following the standard if {reason}

{reason} = [JSON parsers] don't reliably store a number that {what kind of number?}

{what kind of number} = number that is not representable by a IEEE754 double precision float

seems simple

umanwizard 2 days ago | parent | prev [-]

“The standard technically requires that JSON parsers reliably store numbers, even those that are not representable by an IEEE double”.

(It seems this claim is not true, but at least that’s what the sentence means.)

rr808 a day ago | parent | prev | next [-]

> JSON is better than XML

hard disagree on that one.

motorest 18 hours ago | parent [-]

> hard disagree on that one.

Ok, thanks for your insight.

rr808 13 hours ago | parent [-]

You're welcome! I hope it was helpful.

llm_nerd 20 hours ago | parent | prev [-]

>JSON is better than XML

JSON is still hack garbage compared to XML from the turn of the millennia. Like most dominant tech standards, JSON took hold purely because many developers are intellectually lazy and it was easier to slam some sloppy JSON together than to understand XML.

XML with XSD, XPath and XQuery is simply a glorious combination.