Remix.run Logo
RedShift1 a day ago

Ok, but in practice, what does this mean for the characters? Are there certain characters unavailable?

chrismorgan a day ago | parent | next [-]

It’s the unpaired surrogate code points. That’s the whole thing. It’s about encoding ill-formed UTF-16, which is distressingly common in the real world.

numpad0 a day ago | parent | prev [-]

broken emojis? There apparently are known issues that some frameworks break Unicode at wrong boundaries, maybe the author saw it regularize into a deeper mess

masklinn a day ago | parent [-]

It’s not just broken emoji, it’s straight up broken content: UTF-8 can not represent unpaired surrogates.

WTF-8 is necessary for Rust’s compatibility with Windows filesystems (it underlines OsString on Windows) as e.g. file names are sequences of UTF-16 code units (and thus may contain unpaired surrogates).