Remix clone Hacker News

new | show | ask | jobs Github

	▲	Mikhail_Edoshin 4 days ago
		No, UTF-16 is much simpler in that aspect. And its design is no less brilliant. (I've written an state machine encoder and decoder for both these encodings.) If an application works a lot with text I'd say UTF-16 looks more attractive for the main internal representation.
	▲	rmunn 4 days ago \| parent [-]
		UTF-16 is simpler most of the time, and that's precisely the problem. Anyone working with UTF-8 knows they will have to deal with multibyte codepoints. People working with UTF-16 often forget about surrogate characters, because they're a lot rarer in most major languages, and then end up with bugs when their users put emoji into a text field.