Remix clone Hacker News

new | show | ask | jobs Github

	▲	msl 4 hours ago
		UTF-32 allows for constant time access to code points. Neither UTF-8 nor UTF-16 can do the same (there are 2 to the power of 20 valid code points, though not all are in use). While most characters might be encodable as a single code point, Python does not normalize strings, so there is no guarantee that even relatively normal characters are actually stored as single code points. Try this in Python: `s = "a\u0308" print(s) print(s[0])` You will see: `ä a`