Yikes, mojibake in 2021.

edit: actually, how did that happen? The apostrophes show up correctly, they’re just all preceded by a Â that doesn’t seem to represent anything?

▲

layer8 an hour ago | parent | next [-]

The page is declared as ISO 8859-1, but the actual bytes of the text appear to be UTF-8. In UTF-8, characters from U+0080 to U+00AF happen to be encoded as C2 <codepoint value>. For example, U+0092 is encoded as C2 92.

C2 in ISO 8859-1 is ”Â”. U+0092 is the control code Private Use 2 in Unicode, and 92 is the same in ISO 8859-1. However, the standard Western Windows code page 1252 extends ISO 8859-1 by assigning “’” (right single quotation mark) to 92.

HTML5/WHATWG requires an ISO 8859-1 charset declaration to be interpreted as Windows-1252 (https://blog.whatwg.org/the-road-to-html-5-character-encodin...), hence the displayed result is “Â’”.

The original Windows-1252 content must have previously been converted to UTF-8 under the assumption that the source is ISO 8859-1, i.e. mapping 92 to U+0092 (Private Use 2) instead of to U+2019 (Right Single Quotation Mark). The resulting UTF-8 encoding was placed in the web page, which however is declared as ISO 8859-1.

▲

wvbdmp an hour ago | parent | next [-]

Delicious, thank you!

▲

layer8 37 minutes ago | parent [-]

I edited my post after verifying the actual bytes, it turned out to be slightly more complicated.

	▲	well_actulily 28 minutes ago \| parent [-]
		The double-encoding path gets you there too: the original UTF-8 \xE2 \x80 \x99 mis-decoded as iso-8859-1 or Windows-1252 and saved back as UTF-8 gives \xC3 \xA2 \xC2 \x80 \xC2 \x99, which in Windows-1252 renders as Ã¢Â€Â™. A WYSIWYG cleanup replacing that mojibake with the Windows-1252 ' (byte 0x92) and saving back as UTF-8 gets you to \xC2 \x92 on disk. Edit: Although maybe that's not the most parsimonious explanation.

▲

root-parent 36 minutes ago | parent | prev [-]

this one does g11n....

▲

31 minutes ago | parent | prev | next [-]

[deleted]

▲

netsharc an hour ago | parent | prev [-]

They're probably Microsoft's "Smart Quotes", which are Unicode. They were presumably stored in UTF-8 but retrieved as ASCII (or ISO-8859-1).