Remix clone Hacker News

new | show | ask | jobs Github

	▲	staplung 5 days ago
		Yes, but the box drawing characters in "ASCII" are all above 127 so they don't encode the same way. So that last AI generated sentence is basically false (or really misleading): ASCII files that consist only of characters in the lower 127 will also be valid UTF-8. But ASCII files that use characters above 127 will not be valid UTF-8. Now, technically, ASCII only concerns the lower 127 characters. There's no single standard definition as to what the upper half of the byte space represents in ASCII itself so technically it's true that all valid ASCII files are valid UTF-8. By the same logic however, the box drawing characters are not ASCII. They're actually part of something called code page 437, which maps those bit patterns to box drawing characters. With other code pages they map to something else, often non-Latin characters or ones with accents. So, the name ASCII flow is misleading and the the output options are too. ;-) Basically, if the high bit is set in UTF-8 it indicates that more than one byte is needed to represent the code point.
	▲	ilovetux 5 days ago \| parent [-]
		Granted, all of that is true, but GP specifically differentiated between ASCII and ASCII Extended, then GP went on to say that after choosing the ASCII option and pasting the text in a text editor on Mac it was reported as UTF-8, which I was pointing out would be true because if the ASCII option is chosen as opposed to the ASCII Extended option then what he ends up with (ASCII) is valid UTF-8 as reported by the text editor.