| ▲ | theobeers 7 hours ago | ||||||||||||||||
Submission statement: Figuring out how to develop a Unicode collator from scratch for a research group that I working with in Berlin was one of my formative experiences as a programmer. Ever since then, I've wanted to write something to collect my thoughts on the Unicode Collation Algorithm and the process of building a conformant implementation. Last summer I had a good excuse to do this, when I decided to adapt my collator to Zig as a way of learning that language. The Unicode standards, and the (relatively) low-level software libraries based on them, do a lot of things for us to make computing possible. We have the luxury of not needing to worry about most of those things most of the time. I find it humbling whenever I do peek under the hood. | |||||||||||||||||
| ▲ | adaptit 6 hours ago | parent [-] | ||||||||||||||||
Probably a naive question, but: couldn't you precompute some vector representation of the string once, and reduce collation to a vector comparison? Basically move the cost upfront and get back to the "fast" byte-comparison case? | |||||||||||||||||
| |||||||||||||||||