| ▲ | raaspazasu 5 hours ago | |||||||||||||||||||||||||||||||
I don't know Korean at all, but this looks cool and a fun project. I'm curious if this reduces typing or has any benefits being in Hangul vs Latin? | ||||||||||||||||||||||||||||||||
| ▲ | xodn348 5 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||
Thanks! One thing that motivated me was curiosity about prompt efficiency in the AI era. Hangul is beautifully dense — a single syllable block packs initial consonant + vowel + final consonant into one character. I wondered if Korean-keyword code might produce shorter prompts for LLMs. I actually tested this with GPT-4o's tokenizer, and the result was the opposite — Korean keywords average 2-3 tokens vs 1 for English. A fibonacci program in Han takes 88 tokens vs 54 in Python. The reason comes down to how LLM tokenizers work. They use BPE (Byte Pair Encoding), which starts with raw bytes and repeatedly merges the most frequent pairs into single tokens. Since training data is predominantly English, words like `function` and `return` appear billions of times and get merged into single tokens. Korean text appears far less frequently, so the tokenizer doesn't learn to merge Hangul syllables — it falls back to splitting each character into 2-3 byte-level tokens instead. It's a tokenizer training bias, not a property of Hangul itself. If a tokenizer were trained on a Korean-heavy corpus, `함수` could absolutely become a single token too. So no efficiency benefit today. But it was a fun exploration, and Korean speakers can read the code like natural language. It could also be a fun way for people learning Korean to practice reading Hangul in a different context — every keyword is a real Korean word with meaning. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | bbrodriguez 5 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||
Korean doesn’t reduce typing compared to English from my experience. What looks like a “character” is actually a syllable block called “eumjeol” that’s made up of consonants (moeum)and vowels (jaeum). You can’t have a vowel only syllable either so you always have to pair it with a null consonant no matter what (which kinda looks like a zero: ㅇ) and while nouns can be much more concise compared to English, verbs can get verbose. The main benefit of Korean actually comes from the fact that the language itself fits perfectly into a standard 27 alphabet keys and laid out in such a way that lets you type ridiculously fast. The consonant letters are always situated in the left half and the vowels are in the right half of the keyboard. This means it is extremely easy to train muscle memory because you’re mostly alternating keystrokes on your left hand and right hand. Anecdotally I feel like when I’m typing in English, each half of my brain needs to coordinate more compared to when I’m typing in Korean, the right brain only need to remember the consonant positions for my left hand and my left brain only need to remember the vowel positions. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||