Remix clone Hacker News

new | show | ask | jobs Github

	▲	amluto 2 hours ago
		Modern string libraries largely use UTF-8 [0], and surrogates, regardless of whether they’re paired, are invalid in UTF-8. So, in a modern string library, as built in to most modern languages, you will not encounter surrogates except when translating between encodings. [0] But everyone disagrees as to what indexing a string means, so you need to make an actual choice if you want anything involving indexing to match across languages.
	▲	chuckadams an hour ago \| parent [-]
		> surrogates, regardless of whether they’re paired, are invalid in UTF-8 Java did not get the memo. Since the char type is fixed at 16 bits, it uses surrogates to encode everything outside the BMP, regardless of the encoding.