“Do non-American LLMs (e.g. DeepSeek, Mistral, Apertus) perform better or worse here? Do they have their own cultural biases in-built?”

I'm wondering the same thing, in addition to the related question of “Would an LLM perform better or worse if prompted with languages other than English?”.

▲

tropdrop 3 days ago | parent | next [-]

ChatGPT is worse in Russian. Example: after accurately noting that a name appeared in a particular Russian book, it asked if I wanted the direct quote in Russian. I said yes. At this point it switched to Russian output but could no longer find the name in that book, and then apologized for having used what seemed to have been "approximations" about the book before.

(I did then go and check the book myself; ChatGPT in English was right, the name is there)

▲

ehnto 3 days ago | parent [-]

I was using Qwen3 locally in thinking mode, and noted that even if it is talking to me in Japanese, it is doing it's "thinking" steps in English. Not having a full understanding of how the layers in an LLM handle language connections I can't say for sure, but for a human this would result in subpar outcomes.

For example (not actual output):

Input: "こにちは"(konichwa) Qwen Thinking: "Ah, the user has said "こにちは", I should respond in a kind and friendly manner.

Qwen Output: こにちは！

It quiiiickly gets confused in this, much quicker than in English.

▲

numpad0 3 days ago | parent | next [-]

I'm kind of wondering when will it become a universal understanding that LLMs can't be trained with equal amounts of Japanese and Chinese contents in training data due to Han Unification, making these two languages incoherent mix of two conflicting syntax in one. It's remarkable that Latin languages is not apparently facing issues without clear technical explanation as to why, which I'm guessing has to do with the fact of granularity of characters.

That said, in my tiny experience, LLMs all think in their dataset majority language. They don't adhere to prompt languages, one way or another. Chinese models usually think in either English or Chinese, rarely in cursed mix thereof, and never in Japanese or any of their non-native languages.

▲

ehnto 3 days ago | parent | next [-]

Would they not quickly bocome divergent vectors? In the same way that apple and Apple can exist in the same vector set with totally different meanings?

So all information gleaned reading a glyph in the context of japanese articles would be totally different vectors to the information gleaned from the same glyph in Chinese?

	▲	numpad0 3 days ago \| parent [-]
		I don't know, but at least older Qwen models were a bit confused as to what words belong to which languages, and recent ones seem noticeably less sure about ja-JP in general. Maybe it vaguely relates Hanzi/Kanji character being more coarse grained than Latin alphabets so that there aren't enough character counts to tell apart or something.

▲

ACCount37 3 days ago | parent | prev [-]

Why would that be an issue?

▲

charlieyu1 3 days ago | parent | prev | next [-]

I don’t think this can be solved until there is massive investment to train LLM in native Japanese. The current ChatGPT tokenizer still use BPE and you can’t even present a Japanese character with a single token

▲

lmm 3 days ago | parent | prev | next [-]

Perhaps it knows most users who misspell こんにちは are English speakers?

	▲	ehnto 3 days ago \| parent [-]
		Ah nah, that was just me here, I'm no good with the phone IME. I tried a bunch of different sentences. It always thought in English. It was pretty good at one shot translations with thinking turned off however, I imagine thinking distracts it from going down the Japanese only vector paths.

▲

ACCount37 3 days ago | parent | prev [-]

Quite a few reasoning LLMs do reasoning in English only. Because the RL setup specifically forces them to do so.

Why?

Because the creators want the reasoning trace to be human readable. And without a pressure forcing them to think in English, they tend to get weird with the reasoning trace. Wild language-mixing, devolved grammar, strange language-mixed nonsense words that the LLM itself seemingly understands just fine.

▲

Miraltar 3 days ago | parent | prev | next [-]

I assume the training dataset is mostly the same anyway. I imagine prompting in different language could have a huge effect though.

▲

slickytail 3 days ago | parent | prev [-]

[dead]