▲ | mid-kid 7 days ago | |||||||||||||
Yeah I have no idea what is wrong with that. Python simply operates on arrays of codepoints, which are a stable representation that can be converted to a bunch of encodings including "proper" utf-8, as long as all codepoints are representable in that encoding. This also allows you to work with strings that contain arbitrary data falling outside of the unicode spectrum. | ||||||||||||||
▲ | deathanatos 6 days ago | parent | next [-] | |||||||||||||
> which are a stable representation that can be converted to a bunch of encodings including "proper" utf-8, as long as all codepoints are representable in that encoding. Which, to humor the parent, is also true of raw bytes strings. One of the (valid) points raised by the gist is that `str` is not infallibly encodable to UTF-8, since it can contain values that are not valid Unicode. > This also allows you to work with strings that contain arbitrary data falling outside of the unicode spectrum. If I write,
… I want the input string to be Unicode. If I need "Unicode, or maybe with bullshit mixed in", that can be a different type, and then I can take
| ||||||||||||||
| ||||||||||||||
▲ | acuozzo 6 days ago | parent | prev [-] | |||||||||||||
> Python simply operates on arrays of codepoints But most programmers think in arrays of grapheme clusters, whether they know it or not. |