Remix.run Logo
account42 7 days ago

It conceptually uses arrays of code points, which need up to 24 bits. Optimizing the storage to use smaller integers when possible is an implementation detail.

jibal 7 days ago | parent | next [-]

Python3 is specified to use arrays of 8, 16, or 32 bit units, depending on the largest code point in the string. As a result, all code points in all strings are O(1) indexable. The claim that "Python 3 internally uses UTF-32" is simply false.

zahlman 6 days ago | parent | prev [-]

> code points, which need up to 24 bits

They need at most 21 bits. The bits may only be available in multiples of 8, but the implementation also doesn't byte-pack them into 24-bit units, so that's moot.