▲ | wrp 4 days ago | |||||||
I need to call out a myth about UTF-8. Tools built to assume UTF-8 are not backwards compatible with ASCII. An encoding INCLUDES but also EXCLUDES. When a tool is set to use UTF-8, it will process an ASCII stream, but it will not filter out non-ASCII. I still use some tools that assume ASCII input. For many years now, Linux tools have been removing the ability to specify default ASCII, leaving UTF-8 as the only relevant choice. This has caused me extra work, because if the data processing chain goes through these tools, I have to manually inspect the data for non-ASCII noise that has been introduced. I mostly use those older tools on Windows now, because most Windows tools still allow you to set default ASCII. | ||||||||
▲ | account42 a day ago | parent | next [-] | |||||||
Do you have an actual example where this causes an issue? "ASCII" tools mostly just passed along non-ASCII bytes unchanged even before UTF-8. | ||||||||
▲ | int_19h 3 days ago | parent | prev | next [-] | |||||||
The usual statement isn't that UTF-8 is backwards compatible with ASCII (it's obvious that any 8-bit encoding wouldn't be; that's why we have UTF-7!). It's that UTF-8 is backwards compatible with tools that are 8-bit clean. | ||||||||
| ||||||||
▲ | kccqzy 4 days ago | parent | prev [-] | |||||||
That's not a myth about UTF-8. That's a decision by tools not to support pure ASCII. |