new | show | ask | jobs Github

ricardobeat 14 hours ago

An int will be 32 bits on any non-ancient platform, so this means, for each of those lines:

- a JSON file with nested values exceeding 2 billion depth

- a file with more than 2 billion lines

- a line with more than 2 billion characters

▲

fizzynut 13 hours ago | parent | next [-]

The depth is 32 bit, not the index into the file.

If you are nesting 2 Billion times in a row ( at minimum this means repeat { 2 billion times followed by a value before } another 2 billion times. You have messed up.

You have 4GB of "padding"...at minimum.

You file is going to be Petabytes in size for this to make any sense.

You are using a terrible format for whatever you are doing.

You are going to need a completely custom parser because nothing will fit in memory. I don't care how much RAM you have.

Simply accessing an element means traversing a nested object 2 billion times in probably any parser in the world is going to take somewhere between minutes and weeks per access.

All that is going to happen in this program is a crash.

I appreciate that people want to have some pointless if(depth > 0) check everywhere, but if your depth is anywhere north of million in any real world program, something messed up a long long time ago, never mind waiting until it hits 2 billion.

	▲	12 hours ago \| parent [-]
		[deleted]

▲

klysm 14 hours ago | parent | prev | next [-]

2 billion characters seems fairly plausible to hit in the real world

▲

ricardobeat 12 hours ago | parent | next [-]

In a single line. Still not impossible, but people handling that amount of data will likely not have “header only and <150 lines” as a strong criteria for choosing their JSON parsing library.

▲

xigoi 4 hours ago | parent | prev | next [-]

For such big data, you should definitely be using an efficient format, not JSON.

▲

naasking 13 hours ago | parent | prev [-]

2GB in a single JSON file is definitely an outlier. A simple caveat when using this header could suffice: ensure inputs are less than 2GB.

▲

layer8 13 hours ago | parent | next [-]

Less than INT_MAX, more accurately. But since the library contains a check when decreasing the counter, it might as well have a check when increasing the counter (and line/column numbers).

▲

jeroenhd 12 hours ago | parent | prev | next [-]

I've seen much bigger, though technically that wasn't valid json, but rather structured logging with JSON on each line. On the other hand, I've seen exported JSON files that could grow to such sizes without doing anything weird, just nothing exceeding a couple hundred megabytes because I didn't use the software for long enough.

Restricting the input to a reasonable size is an easy workaround for sure, but this limitation isn't indicated everywhere, so anyone deciding to consume this random project into their important code wouldn't know to defend against such situation.

In a web server scenario, 2GiB of { (which would trigger two overflows) in a compressed request would require a couple hundred kilobytes to two megabytes, depending on how old your server software is.

	▲	11 hours ago \| parent [-]
		[deleted]

▲

EasyMark 13 hours ago | parent | prev | next [-]

Or fork and make a few modifications to handle it? I have to admit I haven't looked at the code to see if this particular code would allow for that.

▲

maleldil 12 hours ago | parent | prev [-]

Not really. I deal with this everyday. If the library has a limit on the input size, it should mention this.

	▲	naasking 8 hours ago \| parent [-]
		If you deal with this every day, you're an outlier.

▲

ranger_danger 13 hours ago | parent | prev | next [-]

What is your definition of non-ancient? There are still embedded systems being produced today that don't have 32-bit integers.

▲

layer8 14 hours ago | parent | prev [-]

All very possible on modern platforms.

Maybe more importantly, I won’t trust the rest of the code if the author doesn’t seem to have the finite range of integer types in mind.

	▲	johnisgood 14 hours ago \| parent [-]
		Personally, all my C code is written with SEI C Coding Standard in mind.