▲ | kijin a day ago | |||||||
The data representation is fairly straightforward once you're familiar with the composition rules, at least for modern Korean. Unicode simply lists all possible combinations in dictionary order starting from U+AC00. So you can take any code point and split out the 초성, 중성 and 종성 using simple arithmetic, just like you can figure out Latin alphabets from their ASCII codes. | ||||||||
▲ | hyeonwho4 20 hours ago | parent [-] | |||||||
초성 = initial sound (consonant) 중성 = middle sound (vowel) 종성 = final sound (consonant) My understanding is that there are two possible unicode encodings of Korean, one of which (MacOS) is sound by sound instead of syllable by syllable (Windows). This is why Korean UTF-8 filenames from MacOS appear broken on modern Windows machines. | ||||||||
|