▲ | chrismorgan 4 days ago | ||||||||||||||||
> indexing by bytes instead of UTF-8 code units When the encoding is UTF-8 (which it is here), the code unit is the byte. They called the fields byteStart and byteEnd, but a more technically precise (no more or less accurate, but more precise) labels would be utf8CodeUnitStart and utf8CodeUnitEnd. | |||||||||||||||||
▲ | psionides 4 days ago | parent [-] | ||||||||||||||||
Sorry, I keep mixing these - bytes instead of scalars, which I think would be more natural to iterate over in most languages (at least the ones I use). | |||||||||||||||||
|