▲ | Mikhail_Edoshin 4 days ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I once saw a good byte encoding for Unicode: 7 bit for data, 1 for continuation/stop. This gives 21 bit for data, which is enough for the whole range. ASCII compatible, at most 3 bytes per character. Very simple: the description is sufficient to implement it. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | rmunn 4 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Probably a good idea, but when UTF-8 was designed the Unicode committee had not yet made the mistake of limiting the character range to 21 bits. (Going into why it's a mistake would make this comment longer than it's worth, so I'll only expound on it if anyone asks me to). And at this point it would be a bad idea to switch away from the format that is now, finally, used in over 99% of all documents online. The gain would be small (not zero, but small) and the cost would be immense. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | restalis 3 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This fits your description: https://en.wikipedia.org/wiki/Variable-length_quantity |