Remix clone Hacker News

new | show | ask | jobs Github

	▲	Sharlin 4 days ago
		The point is that you don’t have a "seek" operation available. You are given a bytestream and aren’t told if you’re at the start, in a valid position between code points, or in the middle of a code point. UTF-8’s self-synchronizing property means that by reading a single byte you immediately know if you’re in the middle of a code point, and that by reading and discarding at most two additional bytes you’re synchronized and can start/return decoding. That wouldn’t be possible if continuation bytes used all the bits for payload.
	▲	Dylan16807 3 days ago \| parent [-]
		Yes, the point is being able to synchronize. But it doesn't matter if it takes 1 byte or 3 bytes to synchronize. And being unable to read backwards is not a problem. (EMBL doesn't synchronize in three bytes but other encodings do.)