The library doesn’t check for signed integer overflow here:

https://github.com/rxi/sj.h/blob/eb725e0858877e86932128836c1...

Certain inputs can therefore trigger UB.

▲ hypeatei 14 hours ago | parent | next [-]

You're not aware of the simplistic, single header C library culture that some developers like to partake in. Tsoding (a streamer) is a prime example of someone who likes developing/using these types of libraries. They acknowledge that these things aren't focused on "security" or "features" and that's okay. Not everything is a super serious business project exposed to thousands of paying customers.

▲

layer8 14 hours ago | parent | next [-]

Hobby projects that prove useful have a tendency of starting to be used in production code, and then turning into CVEs down the road.

If there is a conscious intent of disregarding safety as you say, the Readme should have a prominent warning about that.

▲

hypeatei 14 hours ago | parent | next [-]

> Hobby projects that prove useful have a tendency of starting to be used in production code

Even if that is true, how is that the authors problem? The license clearly states that they're not responsible for damages. If you were developing such a serious project then you need the appropriate vetting process and/or support contracts for your dependencies.

▲

layer8 14 hours ago | parent [-]

I didn’t say it’s the author’s problem. It’s a problem with the code.

▲

ethanwillis 12 hours ago | parent [-]

Why play all these semantic games? You're saying it's the author's problem. You want them to even edit their readme to include warnings for would be production/business users who don't want to pay for it.

▲

layer8 9 hours ago | parent | next [-]

GP is arguing about licences. Yes, formally there is no obligation, and I'm not saying the author has any such obligation.

In the present case, either the missing overflow check in the code is by mistake, and then it's warranted to point out the error, or, as I understood GGGP to be arguing, the author deliberately decided to neglect safety or correctness, and then in my opinion you can't reject the criticism as unwarranted if the project's presentation isn't explicit about that.

I'm not making anything the author's problem here. Rather, I'm defending my criticism of the code, and am giving arguments as to why it is generally good form to make it explicit if a project doesn't care about the code being safe and correct.

▲

president_zippy 6 hours ago | parent | prev [-]

[flagged]

	▲	lixtra 4 hours ago \| parent [-]
		Layer8 DID the thing though, skimmed through the code and thought about security issues.

▲

koolba 13 hours ago | parent | prev | next [-]

> If there is a conscious intent of disregarding safety as you say, the Readme should have a prominent warning about that.

What do you consider this clause in the LICENSE:

>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

▲

CorrectHorseBat 13 hours ago | parent [-]

A standard clause you can find in every open source license? It doesn't say anything about how serious the project takes security

▲

Yeask 7 hours ago | parent [-]

You write only Rust code don't you?

	▲	CorrectHorseBat 3 hours ago \| parent [-]
		I wish ;) You're talking about how Rust code usually uses the MIT license and this is a part of the MIT license? Every open source license has a very similar clause, include but not limited to BSD, GPL, CDDL, MPL and Apache.

▲

vrighter 14 hours ago | parent | prev | next [-]

then that is their problem, not the code author's. If you use a hobby project in production, that's on you

▲

f1shy 13 hours ago | parent | prev [-]

My personal take is: if the code is good enough, it should be trivial to switch to a better library at the point when needed.

▲

taminka 9 hours ago | parent | prev | next [-]

> They acknowledge that these things aren't focused on "security" or "features" and that's okay.

where? single header is just a way to package software, it has no relation to features, security or anything such...

▲

TZubiri 4 hours ago | parent | prev | next [-]

Either you are :

- overestimating the gravity of a UB and its security implications

- underestimate the value of a 150 line json parser

- or overestimate the feasibility of having both a short and high quality parser.

It sometimes happens that fixing a bug is quicker than defending the low quality. Not everything is a tradeoff.

▲

zwnow 14 hours ago | parent | prev [-]

So if its a hobby project designed for just a handful of people, its suddenly okay to endanger them due to being sloppy?

▲

hypeatei 14 hours ago | parent | next [-]

This is an open source project that you're not obligated to use nor did you pay for it. Who is it endangering?

The license also makes it clear that the authors aren't liable for any damages.

▲

flykespice 14 hours ago | parent [-]

...and what open source software license in the world makes the author liable for damages?

▲

Yeask 7 hours ago | parent | next [-]

None. That is how RedHat makes money.

▲

k_roy 13 hours ago | parent | prev [-]

Probably more of lack of explicit liability in the license.

	▲	virtue3 12 hours ago \| parent [-]
		every OSS license I've ever seen is "use at your own risk" essentially. That's how this whole system works. You find a vulnerability? patch it, push change to repo maintainer. https://xkcd.com/2347

▲

nkrisc 13 hours ago | parent | prev | next [-]

The code nor author don’t endanger anyone. Whoever uses it inappropriately endangers themselves or others.

Why are you using random, unvetted and unaudited code where safety is important?

▲

Yeask 7 hours ago | parent | prev | next [-]

Open Source is about sharing knowledge.

They are sharing their knowledge about how to create a tiny JSON parser. Where is the problem again?

	▲	zwnow 2 hours ago \| parent [-]
		Refer to the original comment. Seems like you are incapable of connecting the comment chain.

▲

tossaway0 5 hours ago | parent | prev [-]

Yes, pretty much. It has enough of a warning.

▲ skydhash 14 hours ago | parent | prev | next [-]

There was a nice article [0] about bloated edge cases libraries (discussion [1]).

Sometimes, it's just not the responsibility of the library. Trying to handle every possible errors is a quick way to complexity.

[0]: https://43081j.com/2025/09/bloat-of-edge-case-libraries

[1]: https://news.ycombinator.com/item?id=45319399

▲

klysm 14 hours ago | parent | next [-]

Strongly disagree here because JSON can come from untrusted sources and this has security implications. It's not the same kind of problem that the bloat article discusses where you just have bad contracts on interfaces.

▲

Brian_K_White 12 hours ago | parent | next [-]

Public facing interfaces are their own special thing, regardless if json or anything else, and not all data is a public facing interface.

If you need it, then you need it. But if you don't need it, then you don't need it. There is a non-trivial value in the smallness and simplicity, and a non-trivial cost in trying to handle infinity problems when you don't have infinity use-case.

▲

klysm 11 hours ago | parent [-]

This is a serialization library. The entire point is to communicate with data that's coming from out of process. It should be safe by default especially if it's adding a quick check to avoid overflow and undefined behavior.

	▲	Brian_K_White 10 hours ago \| parent [-]
		Incorrect assumption. If you are reading data from a file or stream that only you yourself wrote some other time, then it's true that data could possibly have been corrupted or something, but it's not true that it's automatically worth worrying about enough to justify making the code and thus it's bug surface larger. How likely is the problem, how bad are the consequences if the problem happens, how many edge cases could possibly exist, how much code does it take to handle them all? None of these are questions you or anyone else can say about anyone else's project ahead of time. If the full featured parser is too big, then the line drawing the scope of the lightweight parser has to go somewhere, and so of course there will be things on the other side of that line no matter where it is except all the way back at full-featured-parser. "just this one little check" is not automatially reasonable, because that check isn't automatically more impoprtant than any other, and they are all "just one little checks"s. The one little check would perevent what? Maybe a problem that never happens or doesn't hurt when it does happen. A value might be misinerpreted? So what? Let it. Maybe it makes more sense to handle that in the application code the one place it might matter. If it will matter so much, then maybe the application needs the full fat library.

▲

Yeask 7 hours ago | parent | prev | next [-]

You would use this for parsing data you know is safe.

Using a "tiny library" for parsing untrusted data is where the mistake is. Not in OP code.

▲

president_zippy 6 hours ago | parent | prev | next [-]

It's too bad this header-only JSON library doesn't meet your requirements. How much did you pay for your license to use it? I'm sure the author will be happy to either ship security fixes or give you a refund. You should reach out to him and request support.

▲

leptons 13 hours ago | parent | prev [-]

JSON does not necessarily come from untrusted sources if you control the entire system. Not everything needs to be absolutely 100% secure so long as you control the system. If you are opening the system to the public, then sure, you should strive for security, but that isn't always necessary in projects that are not processing public input.

Here's an example - I once coded a limited JSON parser in assembly language. I did not attempt to make it secure in any way. The purpose was to parse control messages sent over a serial port connection to an embedded CPU that controlled a small motor to rotate a camera and snap a photo. There was simply no way for any "untrusted" JSON to enter the system. It worked perfectly and nothing could ever be compromised by having a very simple JSON parser in the embedded device controlling the motor.

▲

interstice 12 hours ago | parent | next [-]

Massively agree. Remember this thinking being everywhere with databases back in the day, not every text field is hooked up to a Wordpress comment section.

▲

ghurtado 11 hours ago | parent | prev | next [-]

> Not everything needs to be absolutely 100% secure so long as you control the system.

Isn't that a bit like saying "you don't have to worry about home security as long as you are the only person who has the ability to enter your house"?

	▲	ncruces 2 hours ago \| parent \| next [-]
		Sure. I don't password protect my (Android) TV like I password protect my (Android) phone, despite both of them allowing authorized access to the same Google accounts, because if someone entered my house I have bigger things to worry than them using my TV.
	▲	Yeask 7 hours ago \| parent \| prev \| next [-]
		Not at all.
	▲	Mogzol 5 hours ago \| parent \| prev [-]
		I mean yeah if you're truly the only person that has the ability to enter your house then why should you worry about home security? Nobody else has the ability to get in.

▲

boramalper 12 hours ago | parent | prev | next [-]

Untrusted doesn’t always mean adversarial IMO, even a bitrot can invalidate your entire input and possibly also trigger undefined behaviour if you aren’t prepared to handle that.

	▲	cwmoore 12 hours ago \| parent \| next [-]
		UB = "undefined behavior", thanks
	▲	leptons 11 hours ago \| parent \| prev [-]
		I was using a checksum to protect against "bitrot" since this was over a very noisy serial transmission line (over a slip ring). So, no, there was no "undefined behavior" and it's quite easy to avoid.

▲

userbinator 12 hours ago | parent | prev [-]

You probably didn't control the other end, as otherwise you would've used something more sane than JSON?

▲

leptons 11 hours ago | parent [-]

I controlled both ends. There is nothing "insane" about JSON. It's used far and wide for many purposes. The system sending the JSON was based on Nodejs, so it was pretty natural to use JSON. And I did it with JSON just because I wanted to. I'd have had to invent some other protocol to do it anyway, and I didn't feel like reinventing the wheel when it was quite simple to write a basic JSON parser in assembly language, which is what I am comfortable with on the embedded system (been coding assembly for 40 years).

▲

userbinator 10 hours ago | parent [-]

For something that simple I'd choose a custom binary protocol or something like ASN.1 instead of JSON. It's easier to generate from a HLL and parse in a LLL (I've also been writing Asm for a few decades...)

	▲	leptons 3 hours ago \| parent [-]
		I've done plenty of custom binary protocols before. I can't say they were any better or easier to deal with. I also can't say that the "parser" for a binary format was any easier than a simple, limited JSON parser. For this specific project I chose JSON and it worked perfectly. Sending JSON from the embedded CPU was also really simple. Yes, there was a little overhead on a slow connection, but I wasn't getting anywhere near saturation. I think it was 9600 bps max on a noisy connection with checksums. If even 10% of the JSON "packets" got through it was still plenty for the system to run.

▲

layer8 14 hours ago | parent | prev | next [-]

The problem in the present case is that the caller is not made aware of the limitation, so can’t be expected to prevent passing unsupported input, and has no way to handle the overflow case after the fact.

▲

skydhash 14 hours ago | parent [-]

Do you not review libraries you add to your project? A quick scan of the issues page if it's on a forge? Or just reading through the code if it's small enough (or select functions)?

Code is the ultimate specification. I don't trust the docs if the behavior is different from what it's saying (or more often fails to mention). And anything that deals with recursive structures (or looping without a clear counter and checks) is my one of the first candidate for checks.

> has no way to handle the overflow case after the fact.

Fork/Vendor the code and add your assertions.

	▲	layer8 14 hours ago \| parent \| next [-]
		Obviously I just did review it, and my conclusion was to not use that code. In the spirit of the article you linked, I’d rather write my own version.
	▲	jama211 13 hours ago \| parent \| prev \| next [-]
		If it has limitations they should be documented though right? especially if they’re security concerns.
	▲	knowitnone2 7 hours ago \| parent \| prev [-]
		If you review libraries, why do you need to quick scan the issues? You would have already identified all the issues right? Right?

▲

FooBarBizBazz 13 hours ago | parent | prev | next [-]

This might be the right attitude for a max function written in JavaScript, where the calling code has some control over the inputs.

It's the wrong attitude for a JSON parser written in C, unless you like to get owned.

▲

paulddraper 12 hours ago | parent | prev | next [-]

The amount of untrusted JSON I parse is very high.

UB is bad.

▲

flykespice 14 hours ago | parent | prev | next [-]

There is no easy way out when you're working with C: either you handle all possible UB cases with exhaustive checks, or you move on to another language.

(TIP: choose the latter)

▲

uecker 11 hours ago | parent | next [-]

For signed overflow you can just turn on the sanitizer in trapping mode. Exhaustive checks is also not that terrible.

▲

jeroenhd 13 hours ago | parent | prev [-]

Very few programming languages default to checked increments. Most Rust or Java programmers would make the same mistake.

Writing a function to do a checked addition like in other languages isn't exactly difficult, either.

▲

koito17 10 hours ago | parent | next [-]

> Most Rust or Java programmers would make the same mistake.

Detecting these mistakes in Rust is not too difficult. In debug builds, integer overflow triggers a panic[1]. Additionally, clippy (the official linter of Rust), has a rule[2] to detect this mistake.

[1] https://doc.rust-lang.org/book/ch03-02-data-types.html#integ...

[2] https://rust-lang.github.io/rust-clippy/master/index.html#ar...

▲

paulddraper 12 hours ago | parent | prev | next [-]

Yes but those languages have defined overflow.

	▲	cozzyd 11 hours ago \| parent [-]
		-fwrapv

▲

12 hours ago | parent | prev [-]

[deleted]

▲

RossBencina 7 hours ago | parent | prev [-]

> Sometimes, it's just not the responsibility of the library.

Sometimes. In this case, where the library is a parser that is written in C. I think it is reasonable to expect the library to handle all possible inputs. Even corner cases like this which are unlikely to be encountered in common practice. This is not "bloat" it is correctness.

In C, this kind of bug is capable of being exploited. Sure, many users of this lib won't be using it in exposed cases, but sooner or later the lib will end up in some widely-used internet-facing codebase.

As others have said, the fix could be as simple as bailing once the input size exceeds 1GB. Or it could be fine-grained. Either-way the fix would not "bloat" the codebase.

And yes, I'm well aware of the single-file C library movement. I am a fan.

▲ habibur 10 hours ago | parent | prev | next [-]

Will trigger UB if level depth is > 2 billion or in the 2nd case number of lines > 2 billion.

Limit you JS input to 1 GB. I will have more problems in other portions of the stack if I start to receive a 2 GB JSON file over the web.

And if I still want to make it work for > 2GB, I would change all int in the source to 64 bits. Will still crash if input is > 2^64.

What I won't ever do in my code is check for int overflow.

	▲	jcalvinowens 9 hours ago \| parent [-]
		> What I won't ever do in my code is check for int overflow Amen. Just build with -fno-strict-overflow, my hot take is that should be the default on Linux anyway.

▲ ricardobeat 14 hours ago | parent | prev | next [-]

An int will be 32 bits on any non-ancient platform, so this means, for each of those lines:

- a JSON file with nested values exceeding 2 billion depth

- a file with more than 2 billion lines

- a line with more than 2 billion characters

▲

fizzynut 13 hours ago | parent | next [-]

The depth is 32 bit, not the index into the file.

If you are nesting 2 Billion times in a row ( at minimum this means repeat { 2 billion times followed by a value before } another 2 billion times. You have messed up.

You have 4GB of "padding"...at minimum.

You file is going to be Petabytes in size for this to make any sense.

You are using a terrible format for whatever you are doing.

You are going to need a completely custom parser because nothing will fit in memory. I don't care how much RAM you have.

Simply accessing an element means traversing a nested object 2 billion times in probably any parser in the world is going to take somewhere between minutes and weeks per access.

All that is going to happen in this program is a crash.

I appreciate that people want to have some pointless if(depth > 0) check everywhere, but if your depth is anywhere north of million in any real world program, something messed up a long long time ago, never mind waiting until it hits 2 billion.

	▲	12 hours ago \| parent [-]
		[deleted]

▲

klysm 14 hours ago | parent | prev | next [-]

2 billion characters seems fairly plausible to hit in the real world

▲

ricardobeat 12 hours ago | parent | next [-]

In a single line. Still not impossible, but people handling that amount of data will likely not have “header only and <150 lines” as a strong criteria for choosing their JSON parsing library.

▲

xigoi 4 hours ago | parent | prev | next [-]

For such big data, you should definitely be using an efficient format, not JSON.

▲

naasking 13 hours ago | parent | prev [-]

2GB in a single JSON file is definitely an outlier. A simple caveat when using this header could suffice: ensure inputs are less than 2GB.

▲

layer8 13 hours ago | parent | next [-]

Less than INT_MAX, more accurately. But since the library contains a check when decreasing the counter, it might as well have a check when increasing the counter (and line/column numbers).

▲

jeroenhd 12 hours ago | parent | prev | next [-]

I've seen much bigger, though technically that wasn't valid json, but rather structured logging with JSON on each line. On the other hand, I've seen exported JSON files that could grow to such sizes without doing anything weird, just nothing exceeding a couple hundred megabytes because I didn't use the software for long enough.

Restricting the input to a reasonable size is an easy workaround for sure, but this limitation isn't indicated everywhere, so anyone deciding to consume this random project into their important code wouldn't know to defend against such situation.

In a web server scenario, 2GiB of { (which would trigger two overflows) in a compressed request would require a couple hundred kilobytes to two megabytes, depending on how old your server software is.

	▲	11 hours ago \| parent [-]
		[deleted]

▲

EasyMark 13 hours ago | parent | prev | next [-]

Or fork and make a few modifications to handle it? I have to admit I haven't looked at the code to see if this particular code would allow for that.

▲

maleldil 12 hours ago | parent | prev [-]

Not really. I deal with this everyday. If the library has a limit on the input size, it should mention this.

	▲	naasking 8 hours ago \| parent [-]
		If you deal with this every day, you're an outlier.

▲

ranger_danger 13 hours ago | parent | prev | next [-]

What is your definition of non-ancient? There are still embedded systems being produced today that don't have 32-bit integers.

▲

layer8 14 hours ago | parent | prev [-]

All very possible on modern platforms.

Maybe more importantly, I won’t trust the rest of the code if the author doesn’t seem to have the finite range of integer types in mind.

	▲	johnisgood 14 hours ago \| parent [-]
		Personally, all my C code is written with SEI C Coding Standard in mind.

▲ pgen an hour ago | parent | prev | next [-]

The author has kindly provided you with simple, readable, and free code. If you find it incomplete or unsafe, you can always modify it and contribute your changes if you wish to improve it, in accordance with the licence; and thank him while you're at it.

▲ odie5533 14 hours ago | parent | prev | next [-]

Can't use this library in production that's for sure.

	▲	13 hours ago \| parent [-]
		[deleted]

▲ Gibbon1 2 hours ago | parent | prev | next [-]

I've been tending to use ssize_t for indexes instead of int. Part of the reason was reading someones decent argument that

   for(int i=0; blah blah; i++)

Is actually broken and dangerous on 64 bit machines.

	▲	oguz-ismail an hour ago \| parent [-]
		How is ssize_t any better? It's not part of standard C and is only guaranteed to be capable of holding values between -1 and SSIZE_MAX (minimum 32767, no relation to SIZE_MAX).

▲ robmccoll 9 hours ago | parent | prev | next [-]

Could just change the input len to an int instead of size_t. Not technically the correct type, but it would make it clear to the user that the input can't be greater than 2^31 in length.

▲ EmilStenstrom 14 hours ago | parent | prev | next [-]

Submit a PR!

▲ modeless 9 hours ago | parent | prev | next [-]

I wouldn't expect a library like this to be secure. If you want it to be memory safe, compile it with Fil-C.

▲

layer8 9 hours ago | parent [-]

This has nothing to do with memory safety.

▲

modeless 9 hours ago | parent [-]

This is an overstatement. Yes, UB does not necessarily cause a violation of memory safety, but triggering UB alone is not the goal of an attacker. UB is a means to an end and the end is usually a violation of memory safety leading to arbitrary code execution.

	▲	layer8 8 hours ago \| parent [-]
		The primary point was that the code doesn't ensure correct processing (or returning an appropriate error) for all JSON. Even if behavior is defined by the C implementation, the overflow can lead to parser mismatch vulnerabilites, if nothing else. There are likely other "defined" failure modes the overflow can enable here. UB was a secondary observation, but it also can lead to logic errors in that vein, without involving memory safety. I'm not sure I agree that UB usually leads to memory safety violations, but in any case, the fact that signed integer overflow is UB isn't what makes the code incorrect and unsafe in the first place.

▲ mscrnt 5 hours ago | parent | prev | next [-]

cut a PR to improve it; that would be nice

▲ oguz-ismail 12 hours ago | parent | prev [-]

    diff --git a/sj.h b/sj.h
    index 60bea9e..25f6438 100644
    --- a/sj.h
    +++ b/sj.h
    @@ -85,6 +85,7 @@ top:
             return res;
     
         case '{': case '[':
    +        if (r->depth > 999) { r->error = "can't go deeper"; goto top; }
             res.type = (*r->cur == '{') ? SJ_OBJECT : SJ_ARRAY;
             res.depth = ++r->depth;
             r->cur++;

There, fixed it

	▲	meindnoch 10 hours ago \| parent [-]
		I can only hope this was made by an LLM and not a real human.