Remix.run Logo
bjoli 4 hours ago

What has been going on with deepseek recently? I have gotten lots of replies in Chinese and even more frequently, reasoning in Chinese as well.

Is it a new silent update?

throwa356262 an hour ago | parent | next [-]

Happened to me with Claude, doesn't need to be a China thing.

Shank 4 hours ago | parent | prev | next [-]

Well, it is a Chinese model, maybe it thinks better in Chinese?

bogdan 3 hours ago | parent [-]

Hànzì can use 30%-40% fewer tokens than English. So, yes, it probably thinks better in Chinese.

Razengan 2 hours ago | parent [-]

If so, would other models like ChatGPT benefit from translating the user's prompt to Chinese/Japanese and thinking in Hanzi/Kanji and then converting the response back to the user's language before displaying it?

cocoflunchy 2 hours ago | parent | next [-]

I believe that most reasoning models actually think in their own "language" which is not really understandable by humans. The thinking traces that are shown in the UI are actually summaries generated by a smaller model in plain english (or user language). Sometimes this leaks through and you see some chinese/japanese characters in e.g. Claude's reasoning.

dryarzeg 2 hours ago | parent | next [-]

As far as I'm aware, it's not true for models like DeepSeek or other Chinese open-weight models (at least those that I have seen); their reasoning traces are fully composed from some human language, be it English, Chinese or another one; by the way, most of them can adapt their reasoning based on user language, for example, if user speaks English the reasoning more likely will be in English.

I think that for DeepSeek problem (thinking and replying in Chinese) everything is kinda simpler: in their official chat, they're probably using some kind of system prompt which is (probably) written in Chinese, so that's why model may prefer Chinese in it's output.

kgeist 2 hours ago | parent | prev | next [-]

Summaries by different smaller models are usually made by closed proprietary models like Claude as a way to combat the distillation of real reasoning traces by competitors. Open weight models show the real reasoning traces. Reasoning traces operate in the same space as the non-reasoning output. It's all just one large text for an LLM. Internally, reasoning is just ordinary chat completion between <think></think> tags.

seydor 2 hours ago | parent | prev [-]

> summaries generated

Or hallucinated

bogdan 2 hours ago | parent | prev | next [-]

There are other even more efficient ways of doing this, i.e. using images instead of raw text https://xcancel.com/karpathy/status/1980397031542989305?lang...

grogg 2 hours ago | parent | prev [-]

Yeah, it’s why the Caveman skill includes a Wenyan mode.

https://github.com/JuliusBrussee/caveman

k__ 2 hours ago | parent | prev | next [-]

Maybe, you could pipe it through T5 or something.

serf 3 hours ago | parent | prev | next [-]

This happens to me a lot when I ask a qwen3.6 model to respond to a question in JSON. No clue why.

surgical_fire 3 hours ago | parent | prev | next [-]

I use DeepSeek daily, never happened to me.

I use the API however, not the chat interface.

cicko 2 hours ago | parent | prev | next [-]

it's a hint that you should start learning the new Lingua Franca.

abyssin 4 hours ago | parent | prev | next [-]

It doesn’t seem that recent to me, at least been like that for six months.

RIshabh235 4 hours ago | parent | prev | next [-]

yes, kind of silent update plus they might have better chinese datasets and user data for their training, that might be leading to chinese preference.

alfiedotwtf 3 hours ago | parent | prev | next [-]

Are you running out of context? I’ve found that tooling and giberish most of the time happens when I’m butting up against the high watermark of my context window. One other thing it could be, I’ve read that lower quanta like Q1 and Q2 for smaller models can leak Chinese

epolanski 4 hours ago | parent | prev [-]

It never happened to me with Deepseek, but it happened multiple times with Kimi 2.6.

It also happened a handful of times with Anthropic models.