Remix clone Hacker News

new | show | ask | jobs Github

	▲	rmunn 3 days ago
		That's not what I'm saying at all, I'm saying that in the absence of a BOM header a Unicode-aware app should guess UTF-8 first and then guess other likely encodings second, because the chance of false positives on the "is this UTF-8?" guess is practically indistinguishable from zero. If it isn't UTF-8, the UTF-8 parsing attempt is nearly guaranteed to fail, so it's safe to do first. I'm also saying that apps should not create a BOM header any more (in UTF-8 only, not in UTF-16 where it's required), because the costs of dealing with BOM headers are higher than they're worth. Except in certain specific circumstances, like having to deal with pre-Unicode apps that default to assuming 8-bit encodings.
	▲	mikelabatt 2 days ago \| parent [-]
		Makes sense, thank you. The observation about false positives for UTF-8 tending to zero helps understand. So I will vote for UTF-8 without BOM from now on (while encouraging parsers to deal with it, if present).