Remix.run Logo
sedatk 4 days ago

> For example, how do you handle UTF-8 encoded surrogate pairs?

Surrogate pairs aren’t applicable to UTF-8. That part of Unicode block is just invalid for UTF-8 and should be treated as such (parsing error or as invalid characters etc).

gritzko 4 days ago | parent [-]

In theory, yes. In practice, there are throngs of parsers and converters who might handle such cases differently. https://seriot.ch/projects/parsing_json.html

sedatk 3 days ago | parent [-]

I mean hopefully not, but the linked example is about JSON parsing, not UTF-8.

gritzko 2 days ago | parent [-]

A big chunk of bugs there are Unicode related, that is my point. When people parse JSON they don't think that they also parse Unicode.