▲ | rhet0rica 4 days ago | |
See quectophoton's comment—the requirement that continuation bytes are always tagged with a leading 10 is useful if a parser is jumping in at a random offset—or, more commonly, if the text stream gets fragmented. This was actually a major concern when UTF-8 was devised in the early 90s, as transmission was much less reliable than it is today. | ||
▲ | rhet0rica 2 hours ago | parent [-] | |
Addendum: This was posted to the front page today: https://doc.cat-v.org/bell_labs/utf-8_history It also notes that UTF-8 protects against the dangers of NUL and '/' appearing in filenames, which would kill C strings and DOS path handling, respectively. |