Remix.run Logo
rhet0rica 4 days ago

See quectophoton's comment—the requirement that continuation bytes are always tagged with a leading 10 is useful if a parser is jumping in at a random offset—or, more commonly, if the text stream gets fragmented. This was actually a major concern when UTF-8 was devised in the early 90s, as transmission was much less reliable than it is today.

rhet0rica 2 hours ago | parent [-]

Addendum: This was posted to the front page today: https://doc.cat-v.org/bell_labs/utf-8_history

It also notes that UTF-8 protects against the dangers of NUL and '/' appearing in filenames, which would kill C strings and DOS path handling, respectively.