| ▲ | reedlaw 3 hours ago | |||||||
Are you saying Chinese is more concise than English? Chinese poetry is concise, but that can be true in any language. For LLMs, it depends on the tokenizer. Chinese models are of course more Chinese-friendly and so would encode the same sentence with fewer tokens than Western models. | ||||||||
| ▲ | pxc an hour ago | parent [-] | |||||||
> Are you saying Chinese is more concise than English? Yeah, definitely. It lacks case and verb conjugations, plus whole classes of filler words, and words themselves are on average substantially shorter. If you listen to or read a hyper-literal transliteration of Chinese speech into English (you can find fun videos of this on Chinese social media), it even resembles "caveman speech" for those reasons. If you look at translated texts and compare the English versions to the Chinese ones, the Chinese versions are substantially shorter. Same if you compare localization strings in your favorite open-source project. It's also part of why Chinese apps are so information-dense, and why localizing to other languages often requires reorganizing the layout itself— languages like English just aren't as information-dense, pixel for pixel. The difference is especially profound for vernacular Chinese, which is why Chinese people often note that text which "has a machine translation flavor" is over-specified and gratuitously prolix. Maybe some of this washes out in LLMs due to tokenization differences. But Chinese texts are typically shorter than English texts and it extends to prose as well as poetry. But yeah this is standard stuff: Chinese is more concise and more contextual/ambiguous. More semantic work is allocated in interpretation than with English, less is allocated in the writing/speaking. Do you speak Chinese and experience the differences between Chinese and English differently? I'm a native English speaker and only a beginner in Chinese but I've formed these views in discussion with Chinese people who know some English as well. | ||||||||
| ||||||||