Remix.run Logo
georgemandis 2 hours ago

The language handled it fine. It will generally just show replacement characters (�) for combos that don't map to anything.

It was really `encodeURIComponent` that didn't handle it gracefully.

If you just type this into the console (surrogate pair for cowboy smiley face emoji), you see it encodes it ("%F0%9F%A4%A0"):

encodeURIComponent("\uD83E\uDD20")

If you give it an invalid surrogate pair, it will throw an actual error:

encodeURIComponent("\uDD20\uD83E")

chrismorgan an hour ago | parent [-]

No, the language did not handle it fine. It allowed an invalid Unicode string to exist. This is basically a UTF-16 affliction—nothing that does UTF-16 validates, whereas almost everything that does UTF-8 does validate. encodeURIComponent deals with UTF-8, so of course it throws.

georgemandis 3 minutes ago | parent [-]

I'm realizing `encodeURIComponent` is actually part of the ECMA spec! I thought it was something provided by the browser like `window` or `document`. I withdraw my "the language handled it fine" comment, haha.

Before I'd looked that up I was going to say: I feel like "don't allow an invalid Unicode string to exist all" feels like a separate/bigger problem to me from "handling it fine" when they do get created. To the extent I can hand JavaScript an invalid combination of code units in a variety of other scenarios, returning a � felt fine.

e.g. // valid String.fromCodePoint(0xd83e, 0xdd20) // invalid, but "�" is ... fine? String.fromCodePoint(0xdd20, 0xd83e)