| ▲ | chrismorgan a day ago |
| Python strings aren’t even proper Unicode strings. They’re sequences of code points rather than scalar values, meaning they can contain surrogates. This is incompatible with basically everything: UTF-* as used by sensible things, and unvalidated UTF-16 as used in the likes of JavaScript, Windows wide strings and Qt. |
|
| ▲ | nilslindemann a day ago | parent [-] |
| But isn't 'surrogateescape' supposed to address this? (no expert) https://vstinner.github.io/pep-383.html |
| |
| ▲ | chrismorgan a day ago | parent [-] | | surrogateescape is something else altogether. It’s a hack to allow non-Unicode file names/environment variables/command line arguments in an otherwise-Unicode environment, by smuggling them through a part of the surrogate range (0x80 to 0xFF → U+DC80 to U+DCFF) which otherwise can’t occur (since it’s invalid Unicode). It’s a cunning hack that makes a lot of sense: they used a design error in one place (Python string representation) to cancel out a design error in another place (POSIX being late to the game on Unicode)! | | |
| ▲ | Dylan16807 19 hours ago | parent [-] | | It's not taking advantage of the weird way python strings work. You can put that hack on top of any string format that converts back and forth with unicode. |
|
|