Remix.run Logo
anitil 3 days ago

This was all very interesting, but that polyglot json/yaml/xml payload was a big surprise to me! I had no idea that go's default xml parser would accept proceeding and trailing garbage. I'd always thought of json as one of the simpler formats to parse, but I suppose the real world would beg to differ.

It's interesting that decisions made about seemingly-innocuous conditions like 'what if there are duplicate keys' have a long tail of consequences

mdaniel 2 hours ago | parent | next [-]

> I'd always thought of json as one of the simpler formats to parse, but I suppose the real world would beg to differ

Parsing JSON Is a Minefield (2018) - https://news.ycombinator.com/item?id=40555431 - June, 2024 (56 comments)

et al https://hn.algolia.com/?query=parsing%20json%20is%20a%20mine...

shakna 11 hours ago | parent | prev [-]

Tangent for breaking Python's JSON parser: This has worked for five years. The docs do not say that parsing an invalid piece will result in a RecursionError. They specify JSONDecodeError and UnicodeDecodeError. (There is a RecursionError reference to a key that is off by default - but if its off, we can still raise this...)

    #!/bin/sh

    # Python will hit it's recursion limit
    # If you supply just 4 less than the recursion limit
    # I assume this means there's a few objects on the call stack first
    # Probably: __main__, print, json.loads, and input.

    n="$(python3 -c 'import math; import sys; sys.stdout.write(str(math.floor(sys.getrecursionlimit() - 4)))')"

    echo "N: $n"

    # Obviously invalid, but unparseable without matching pair
    # JSON's grammar is... Not good at being partially parsed.
    left="$(yes [ | head -n "$n" | tr -d '\n')"

    # Rather than exploding with the expected decodeError
    # This will explode with a RecursionError
    # Which naturally thrashes the memory cache.
    echo "$left" | python3 -c 'import json; print(json.loads(input()))'