▲ | spankalee 4 days ago | |||||||||||||||||||||||||
Wouldn't you only need to read backwards at most 3 bytes to see if you were currently at a continuation byte? With a max multi-byte size of 4 bytes, if you don't see a multi-byte start character by then you would know it's a single-byte char. I wonder if a reason is similar though: error recovery when working with libraries that aren't UTF-8 aware. If you slice naively slice an array of UTF-8 bytes, a UTf-8 aware library can ignore malformed leading and trailing bytes and get some reasonable string out of it. | ||||||||||||||||||||||||||
▲ | Sharlin 4 days ago | parent [-] | |||||||||||||||||||||||||
It’s not always possible to read backwards. | ||||||||||||||||||||||||||
|