▲ | CodesInChaos a day ago | ||||||||||||||||||||||||||||||||||||||||||||||
> WTF-8 is a hack intended to be used internally in self-contained systems with components that need to support potentially ill-formed UTF-16 for legacy reasons. > Any WTF-8 data must be converted to a Unicode encoding at the system’s boundary before being emitted. UTF-8 is recommended. WTF-8 must not be used to represent text in a file format or for transmission over the Internet. I strongly disagree with that part. When you need to be able to serialize every possible Windows filename, WTF-8 is a great choice. This could be a backup tool, or an NTFS driver for Linux. I also think rust's serde should always serialize OsString as a bytestring, using WTF-8 on Windows. Instead of the system dependent union of u16/u8 sequences it currently uses. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | Rygian a day ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
The way I read the "Intended Audience", I think the use cases you mention are non-goals for WTF-8: > There is no and will not be any encoding label [ENCODING] or IANA charset alias [CHARSETS] for WTF-8. The goal is to ensure WTF-8 remains fully contained, so that ill-formed strings don't end up processed by systems that expect well-formed strings. If you need to serialize every possible Windows filename, then you must also own the corresponding de-serializer (ie make your solution self-contained), and cannot expect users to work with the serialized contents using tools you do not control. | |||||||||||||||||||||||||||||||||||||||||||||||
▲ | RedShift1 a day ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||
Which characters are not available in UTF-8 that warrant using WTF-8? | |||||||||||||||||||||||||||||||||||||||||||||||
|