▲ | mikelabatt 3 days ago | ||||||||||||||||
In a pure UTF-8 world we would not need it, sure. I get that point. But what do you want to do with 40+ years worth of text files that came after 7-bit ASCII, where they may coexist with UTF-8? If we want to preserve our past, the practical solution is that the OS or app has a default character set for 8-bit text encoding, in addition to supporting (and using as a default) UTF-8. I also agree that "BOM" is the wrong name for an UTF-8... BOM. Byte order is not the issue. But still, it's a header that says that the file, even if empty, is UTF-8. Detecting an 8-bit legacy character set is much more difficult that recognizing (skipping) a BOM. | |||||||||||||||||
▲ | cryptonector 2 days ago | parent [-] | ||||||||||||||||
UTF-8 does not need a BOM at all and never needed it, for two reasons: - first, byte order doesn't affect the UTF-8 encoding, - second, the codeset metadata problem you're trying to solve is a problem that already existed before and still does after UTF-8 enters the scene -- you just have to know if some text file (or whatever) uses UTF-8, ISO 8859-x, SHIFT-JIS, UTF-16, etc. The second point addresses your concern, but that metadata has to be out of band. Putting it in-band creates the sorts of problems that others have pointed out, and it creates an annoyance once all non-Unicode locales are gone. And since the goal is to have Unicode replace all other codesets, and since we've made a great deal of progress in that direction, there is no need now to add this wart. | |||||||||||||||||
|