Remix.run Logo
numpad0 4 days ago

I think the more correct question is why humans don't use em dashes in the first place while LLMs do all the time. And the short answer to that is, because it's Unicode stuff.

Regular computers for human use only support ASCII in US or ISO-5589-1 in EU still to this day, and Unicode reliant East Asian users turn off Unicode input modes before typing English words, leaving the Asian part mostly in pure Unicode and alphanumeric part pure ASCII. So Unicode-ASCII mixed text is just odd by itself. This in turn makes use of em dashes odd.

Same with emojis. LLMs generate Unicode-mapped tokens directly, so they can vocalize any characters within full Unicode ranges. Humans with keyboards(physical or touchscreen) can mostly only produce what's on them.