Remix.run Logo
gavinsyancey 18 hours ago

WTF-8 is actually a real encoding, used for encoding invalid UTF-16 unpaired surrogates for UTF-8 systems: https://simonsapin.github.io/wtf-8/

ptx 2 hours ago | parent | next [-]

Yeah, that had me confused for a bit. And you would never use "charset=wtf-8" (as in the title for this page) because the spec says:

"Any WTF-8 data must be converted to a Unicode encoding at the system’s boundary before being emitted. UTF-8 is recommended. WTF-8 must not be used to represent text in a file format or for transmission over the Internet."

bjackman 17 hours ago | parent | prev [-]

I believe this is what Rust OsStrings are under the hood on Windows.

extraduder_ire 15 hours ago | parent [-]

Which I assume stands for "Windows-Transformation-Format-8(bits)".

mmoskal 14 hours ago | parent [-]

Abstract

WTF-8 (Wobbly Transformation Format − 8-bit) is a superset of UTF-8 that encodes surrogate code points if they are not in a pair.

hedora 12 hours ago | parent [-]

Can you still assume the bytes 0x00 and 0xFF are not present in the string (like in UTF-8?)