▲ | cryptonector 4 days ago | |
> But it does have a wart in that there are byte sequences which are invalid UTF-8 and how to interpret them is undefined. This is not a wart. And how to interpret them is not undefined -- you're just not allowed to interpret them as _characters_. There is right now a discussion[0] about adding a garbage-in/garbage-out mode to jq/jaq/etc that allows them to read and output JSON with invalid UTF-8 strings representing binary data in a way that round-trips. I'm not for making that the default for jq, and you have to be very careful about this to make sure that all the tools you use to handle such "JSON" round-trip the data. But the clever thing is that the proposed changes indeed do not interpret invalid byte sequences as character data, so they stay within the bounds of Unicode as long as your terminal (if these binary strings end up there) and other tools also do the same. |