Remix.run Logo
zahlman 4 days ago

I've always taken "WTF-8" to mean that someone had mistakenly interpreted UTF-8 data as being in Latin-1 (or some other code page) and UTF-8 encoded it again.

deathanatos 4 days ago | parent | next [-]

No, WTF-8[1] is a precisely defined format (that isn't that).

If you imagine a format that can encode JavaScript strings containing unpaired surrogates, that's WTF-8. (Well-formed WTF-8 is the same type as a JS string, through with a different representation.)

(Though that would have been cute name for the UTF-8/latin1/UTF-8 fail.)

[1]: https://simonsapin.github.io/wtf-8/

Izkata 4 days ago | parent [-]

GP is right about the original meaning, author of that page acknowledges hijacking it here: https://news.ycombinator.com/item?id=9611710

zahlman 3 days ago | parent [-]

When I posted that, I was honestly projecting from my own use. I think I may have independently thought of the term on Stack Overflow prior to koalie's tweet, but it's not the easiest thing (by design) to search for comments there (and that's assuming they don't get deleted, which they usually should).

(On review, it appears that the thread mentions much earlier uses...)

Izkata 3 days ago | parent [-]

I did the search because I have a similar memory, I'd place it in the early 2000s before StackOverflow existed, around when people were first switching from latin1 and Windows-1251 and others to UTF-8 on the web and browsers would often pick the wrong encoding, and IE had a submenu where you could tell it which one to use on the page. WTF-8 was a thing because occasionally none of these options would work, because the layers server-side would be misconfigured and cause the double (or more, if it involved user input) encoding. It was also used just in general to complain about UTF-8 breaking everything as it was slowly being introduced.

chrismorgan 4 days ago | parent | prev | next [-]

That thing was occasionally called WTF-8, but not often—it was normally called “double UTF-8” (if given a name at all).

In the last few years, the name has become very popular with Simon Sapin’s definition.

LocalH 2 days ago | parent | next [-]

Say "double UTF-8" out loud ;)

jibal 4 days ago | parent | prev [-]

"if given a name at all"

https://en.wikipedia.org/wiki/Mojibake

zahlman 3 days ago | parent [-]

This describes a broader concept.

4 days ago | parent | prev | next [-]
[deleted]
4 days ago | parent | prev | next [-]
[deleted]
4 days ago | parent | prev [-]
[deleted]