I genuinely do not understand the evaluations of the US AI industry. The chinese models are so close and far cheaper

espadrine 6 hours ago | parent | next [-]

Two aspects to consider:

1. Chinese models typically focus on text. US and EU models also bear the cross of handling image, often voice and video. Supporting all those is additional training costs not spent on further reasoning, tying one hand in your back to be more generally useful.

2. The gap seems small, because so many benchmarks get saturated so fast. But towards the top, every 1% increase in benchmarks is significantly better.

On the second point, I worked on a leaderboard that both normalizes scores, and predicts unknown scores to help improve comparisons between models on various criteria: https://metabench.organisons.com/

You can notice that, while Chinese models are quite good, the gap to the top is still significant.

However, the US models are typically much more expensive for inference, and Chinese models do have a niche on the Pareto frontier on cheaper but serviceable models (even though US models also eat up the frontier there).

▲

coliveira 5 hours ago | parent | next [-]

Nothing you said helps with the issue of valuation. Yes, the US models may be better by a few percentage points, but how can they justify being so costly, both operationally as well as in investment costs? Over the long run, this is a business and you don't make money being the first, you have to be more profitable overall.

▲

ben_w 5 hours ago | parent | next [-]

I think the investment race here is an "all-pay auction"*. Lots of investors have looked at the ultimate prize — basically winning something larger than the entire present world economy forever — and think "yes".

But even assuming that we're on the right path for that (which we may not be) and assuming that nothing intervenes to stop it (which it might), there may be only one winner, and that winner may not have even entered the game yet.

* https://en.wikipedia.org/wiki/All-pay_auction

▲

coliveira 5 hours ago | parent [-]

> investors have looked at the ultimate prize — basically winning something larger than the entire present world economy

This is what people like Altman want investors to believe. It seems like any other snake oil scam because it doesn't match reality of what he delivers.

	▲	saubeidl 5 hours ago \| parent [-]
		Yeah, this is basically financial malpractice/fraud.

▲

ycombigrator 4 hours ago | parent | prev [-]

[dead]

▲

jodleif 6 hours ago | parent | prev | next [-]

1. Have you seen the Qwen offerings? They have great multi-modality, some even SOTA.

▲

brabel 6 hours ago | parent [-]

Qwen Image and Image Edit were among the best image models until Nano Banana Pro came along. I have tried some open image models and can confirm , the Chinese models are easily the best or very close to the best, but right now the Google model is even better... we'll see if the Chinese catch up again.

▲

BoorishBears 3 hours ago | parent [-]

I'd say Google still hasn't caught up on the smaller model side at all, but we've all been (rightfully) wowed enough by Pro to ignore that for now.

Nano Banano Pro starts at 15 cents per image at <2k resolution, and is not strictly better than Seedream 4.0: yet the latter does 4K for 3 cents per image.

Add in the power of fine-tuning on their open weight models and I don't know if China actually needs to catch up.

I finetuned Qwen Image on 200 generations from Seedream 4.0 that were cleaned up with Nano Banana Pro, and got results that were as good and more reliable than either model could achieve otherwise.

▲

dworks 42 minutes ago | parent [-]

FWIW, Qwen Z-Image is much better than Seedream and people (redditors) are saying its better than Nano Banana in their first trials. Its also 7B I think, and open.

	▲	BoorishBears 18 minutes ago \| parent [-]
		I've used and finetuned Z-Image Turbo: it's nowhere near Seedream or even Qwen-Image when the latter is finetuned (also doesn't do image editing yet) It is very good for the size and speed, and I'm excited for the Edit and Base variants... but Reddit has been a bit "over-excited" because it run on their small GPUs and isn't overly resistant to porn.

▲

raincole 5 hours ago | parent | prev | next [-]

> video

Most of AI-generated videos we see on social media now are made with Chinese models.

▲

agumonkey 5 hours ago | parent | prev | next [-]

forgive me for bringing politics into it, are chinese LLM more prone to censorship bias than US ones ?

▲

coliveira 5 hours ago | parent | next [-]

Being open source, I believe Chinese models are less prone to censorship, since the US corporations can add censorship in several ways just by being a closed model that they control.

▲

erikhorton an hour ago | parent | prev | next [-]

Yes extremely likely they are prone to censorship based on the training. Try running them with something like LM Studio locally and ask it questions the government is uncomfortable about. I originally thought the bias was in the GUI, but it's baked into the model itself.

▲

skeledrew 5 hours ago | parent | prev [-]

It's not about a LLM being prone to anything, but more about the way a LLM is fine-tuned (which can be subject to the requirements of those wielding political power).

	▲	agumonkey 3 hours ago \| parent [-]
		that's what i meant even though i could have been more precise

▲

torginus 6 hours ago | parent | prev [-]

Thanks for sharing that!

The scales are a bit murky here, but if we look at the 'Coding' metric, we see that Kimi K2 outperforms Sonnet 4.5 - that's considered to be the price-perf darling I think even today?

I haven't tried these models, but in general there have been lots of cases where a model performs much worse IRL than the benchmarks would sugges (certain Chinese models and GPT-OSS have been guilty of this in the past)

	▲	espadrine 3 hours ago \| parent [-]
		Good question. There's 2 points to consider. • For both Kimi K2 and for Sonnet, there's a non-thinking and a thinking version. Sonnet 4.5 Thinking is better than Kimi K2 non-thinking, but the K2 Thinking model came out recently, and beats it on all comparable pure-coding benchmarks I know: OJ-Bench (Sonnet: 30.4% < K2: 48.7%), LiveCodeBench (Sonnet: 64% < K2: 83%), they tie at SciCode at 44.8%. It is a finding shared by ArtificialAnalysis: https://artificialanalysis.ai/models/capabilities/coding • The reason developers love Sonnet 4.5 for coding, though, is not just the quality of the code. They use Cursor, Claude Code, or some other system such as Github Copilot, which are increasingly agentic. On the Agentic Coding criteria, Sonnet 4.5 Thinking is much higher. By the way, you can look at the Table tab to see all known and predicted results on benchmarks.

▲

jasonsb 7 hours ago | parent | prev | next [-]

It's all about the hardware and infrastructure. If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini. The chinese models may benchmark close on paper, but real-world deployment is different. So you either buy your own hardware in order to run a chinese model at 150-200tps or give up an use one of the Big 3.

The US labs aren't just selling models, they're selling globally distributed, low-latency infrastructure at massive scale. That's what justifies the valuation gap.

Edit: It looks like Cerebras is offering a very fast GLM 4.6

▲

observationist 6 hours ago | parent | next [-]

The network effects of using consistently behaving models and maintaining API coverage between updates is valuable, too - presumably the big labs are including their own domains of competence in the training, so Claude is likely to remain being very good at coding, and behave in similar ways, informed and constrained by their prompt frameworks, so that interactions will continue to work in predictable ways even after major new releases occur, and upgrades can be clean.

It'll probably be a few years before all that stuff becomes as smooth as people need, but OAI and Anthropic are already doing a good job on that front.

Each new Chinese model requires a lot of testing and bespoke conformance to every task you want to use it for. There's a lot of activity and shared prompt engineering, and some really competent people doing things out in the open, but it's generally going to take a lot more expert work getting the new Chinese models up to snuff than working with the big US labs. Their product and testing teams do a lot of valuable work.

	▲	dworks 37 minutes ago \| parent [-]
		Qwen 3 Coder Plus has been braindead this past weekend, but Codex 5.1 has also been acting up. It told me updating UI styling was too much work and I should do it myself. I also see people complaining about Claude every week. I think this is an unsolved problem, and you also have to separate perception from actual performance, which I think is an impossible task.

▲

irthomasthomas 5 hours ago | parent | prev | next [-]

Gemini 3 = ~70tps https://openrouter.ai/google/gemini-3-pro-preview

Opus 4.5 = ~60-80tps https://openrouter.ai/anthropic/claude-opus-4.5

Kimi-k2-think = ~60-180tps https://openrouter.ai/moonshotai/kimi-k2-thinking

Deepseek-v3.2 = ~30-110tps (only 2 providers rn) https://openrouter.ai/deepseek/deepseek-v3.2

▲

jasonsb 5 hours ago | parent [-]

It doesn't work like that. You need to actually use the model and then go to /activity to see the actual speed. I constantly get 150-200tps from the Big 3 while other providers barely hit 50tps even though they advertise much higher speeds. GLM 4.6 via Cerebras is the only one faster than the closed source models at over 600tps.

	▲	irthomasthomas 5 hours ago \| parent [-]
		These aren't advertised speeds, they are the average measured speeds by openrouter across different providers.

▲

DeathArrow 5 hours ago | parent | prev | next [-]

> If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini.

I think GLM 4.6 offered by Cerebras is much faster than any US model.

	▲	jasonsb 5 hours ago \| parent [-]
		You're right, I forgot about that one.

▲

jodleif 6 hours ago | parent | prev | next [-]

Assuming your hardware premise is right (and lets be honest, nobody really wants to send their data to chinese providers) You can use a provider like Cerebras, Groq?

▲

kachapopopow 6 hours ago | parent | prev | next [-]

cerebras AI offers models at 50x the speed of sonnet?

▲

csomar 6 hours ago | parent | prev [-]

According to OpenRouter, z.ai is 50% faster than Anthropic; which matches my experience. z.ai does have frequent downtimes but so does Claude.

▲

Bolwin 5 hours ago | parent | prev | next [-]

Third party providers rarely support caching.

With caching the expensive US models end up being like 2x the price (e.g sonnet) and often much cheaper (e.g gpt-5 mini)

If they start caching then US companies will be completely out priced.

▲

jazzyjackson 7 hours ago | parent | prev | next [-]

Valuation is not based on what they have done but what they might do, I agree tho it's investment made with very little insight into Chinese research. I guess it's counting on deepseek being banned and all computers in America refusing to run open software by the year 2030 /snark

▲

jodleif 6 hours ago | parent | next [-]

> Valuation is not based on what they have done but what they might do

Exactly what I’m thinking. Chinese models catching rapidly. Soon to be on-par with the big dogs.

▲

ksynwa 6 hours ago | parent [-]

Even if they do continue to lag behind they are a good bet against monopolisation by proprietary vendors.

	▲	coliveira 5 hours ago \| parent [-]
		They would if corporations were allowed to run these models. I fully expect the US government to prohibit corporations from doing anything useful with Chinese models (full censorship). It's the same game they use with chips.

▲

bilbo0s 6 hours ago | parent | prev [-]

>I guess it's counting on deepseek being banned

And the people making the bets are in a position to make sure the banning happens. The US government system being what it is.

Not that our leaders need any incentive to ban Chinese tech in this space. Just pointing out that it's not necessarily a "bet".

"Bet" imply you don't know the outcome and you have no influence over the outcome. Even "investment" implies you don't know the outcome. I'm not sure that's the case with these people?

	▲	coliveira 5 hours ago \| parent [-]
		Exactly. "Business investment" these days means that the people involved will have at least some amount of power to determine the winning results.

▲

newyankee 7 hours ago | parent | prev | next [-]

Yet tbh if the US industry had not moved ahead and created the race with FOMO it would not had been easier for Chinese strategy to work either.

The nature of the race may change as yet though, and I am unsure if the devil is in the details, as in very specific edge cases that will work only with frontier models ?

▲

mrinterweb 5 hours ago | parent | prev | next [-]

I would expect one of the motivations for making these LLM model weights open is to undermine the valuation of other players in the industry. Open models like this must diminish the value prop of the frontier focused companies if other companies can compete with similar results at competitive prices.

▲

fastball 5 hours ago | parent | prev | next [-]

They're not that close (on things like LMArena) and being cheaper is pretty meaningless when we are not yet at the point where LLMs are good enough for autonomy.

▲

rprend 2 hours ago | parent | prev | next [-]

People pay for products, not models. OpenAI and Anthropic make products (ChatGPT, Claude Code).

▲

beastman82 5 hours ago | parent | prev | next [-]

Then you should short the market

▲

isamuel 6 hours ago | parent | prev [-]

There is a great deal of orientalism --- it is genuinely unthinkable to a lot of American tech dullards that the Chinese could be better at anything requiring what they think of as "intelligence." Aren't they Communist? Backward? Don't they eat weird stuff at wet markets?

It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed. Even now, when you ask questions like what you ask of that era, the answers you get are genuinely not better than "yes, this should have been obvious at the time if you were not completely blinded by ethnic and especially ideological prejudice."

▲

mosselman 6 hours ago | parent | next [-]

Back when deepseek came out and people were tripping over themselves shouting it was so much better than what was out there, it just wasn’t good.

It might be this model is super good, I haven’t tried it, but to say the Chinese models are better is just not true.

What I really love though is that I can run them (open models) on my own machine. The other day I categorised images locally using Qwen, what a time to be alive.

Further even than local hardware, open models make it possible to run on providers of choice, such as European ones. Which is great!

So I love everything about the competitive nature of this.

▲

CamperBob2 6 hours ago | parent [-]

If you thought DeepSeek "just wasn't good," there's a good chance you were running it wrong.

For instance, a lot of people thought they were running "DeepSeek" when they were really running some random distillation on ollama.

▲

bjourne 5 hours ago | parent [-]

WDYM? Isn't https://chat.deepseek.com/ the real DeepSeek?

	▲	CamperBob2 4 hours ago \| parent [-]
		Good point, I was assuming the GP was running local for some reason. Hard to argue when it's the official providers who are being compared. I ran the 1.58-bit Unsloth quant locally at the time it came out, and even at such low precision, it was super rare for it to get something wrong that o1 and GPT4 got right. I have never actually used a hosted version of the full DS.

▲

breppp 6 hours ago | parent | prev | next [-]

Not sure how the entire Nazi comparison plays out, but at the time there were good reasons to imagine the Soviets will fall apart (as they initially did)

Stalin just finished purging his entire officer corps, which is not a good omen for war, and the USSR failed miserably against the Finnish who were not the strongest of nations, while Germany just steamrolled France, a country that was much more impressive in WW1 than the Russians (who collapsed against Germany)

▲

ecshafer 4 hours ago | parent | prev | next [-]

I don't think that anyone, much less someone working in tech or engineering in 2025, could still hold beliefs about Chinese not being capable scientists or engineers. I could maybe give (the naive) pass to someone in 1990 thinking China will never build more than junk. But in 2025 their product capacity, scientific advancement, and just the amount of us who have worked with extremely talented Chinese colleagues should dispel those notions. I think you are jumping to racism a bit fast here.

Germany was right in some ways and wrong in others for the soviet unions strength. USSR failed to conquer Finland because of the military purges. German intelligence vastly under-estimated the amount of tanks and general preparedness of the Soviet army (Hitler was shocked the soviets had 40k tanks already). Lend Lease act really sent an astronomical amount of goods to the USSR which allowed them to fully commit to the war and really focus on increasing their weapon production, the numbers on the amount of tractors, food, trains, ammunition, etc. that the US sent to the USSR is staggering.

	▲	hnfong 2 hours ago \| parent [-]
		I don't think anyone seriously believes that the Chinese aren't capable, it's more like people believe no matter what happens, USA will still dominate in "high tech" fields. A variant of "American Exceptionalism" so to speak. This is kinda reflected in the stock market, where the AI stocks are surging to new heights every day, yet their Chinese equivalents are relatively lagging behind in stock price, which suggests that investors are betting heavily on the US companies to "win" this "AI race" (if there's any gains to be made by winning). Also, in the past couple years (or maybe a couple decades), there had also been a lot of crap talk about how China has to democratize and free up their markets in order to be competitive with the other first world countries, together with a bunch of "doomsday" predictions for authoritarianism in China. This narrative has completely lost any credibility, but the sentiment dies slowly...

▲

newyankee 6 hours ago | parent | prev | next [-]

but didn't Chinese already surpass the rest of the world in Solar, batteries, EVs among other things ?

	▲	cyberlimerence 6 hours ago \| parent [-]
		They did, but the goalposts keep moving, so to speak. We're approximately here : advanced semiconductors, artificial intelligence, reusable rockets, quantum computing, etc. Chinese will never catch up. /s

▲

gazaim 4 hours ago | parent | prev | next [-]

These Americans have no comprehension of intelligence being used to benefit humanity instead of being used to fund a CEO's new yacht. I encourage them to visit China to see how far the USA lags behind.

▲

lukan 6 hours ago | parent | prev | next [-]

"It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; ..."

Ideology played a role, but the data they worked with, was the finnish war, that was disastrous for the sowjet side. Hitler later famously said, it was all a intentionally distraction to make them believe the sowjet army was worth nothing. (Real reasons were more complex, like previous purging).

	▲	ycombigrator 4 hours ago \| parent [-]
		[dead]

▲

littlestymaar 6 hours ago | parent | prev [-]

> It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed

Though, because Stalin had decimated the red army leadership (including most of the veteran officer who had Russian civil war experience) during the Moscow trials purges, the German almost succeeded.

▲

gazaim 3 hours ago | parent [-]

> Though, because Stalin had decimated the red army leadership (including most of the veteran officer who had Russian civil war experience) during the Moscow trials purges, the German almost succeeded.

There were many counter revolutionaries among the leadership, even those conducting the purges. Stalin was like "ah fuck we're hella compromised." Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception.

	▲	littlestymaar 3 hours ago \| parent [-]
		> There were many counter revolutionaries among the leadership Well, Stalin was, by far, the biggest counter-revolutionary in the Politburo. > Stalin was like "ah fuck we're hella compromised." There's no evidence that anything significant was compromised at that point, and clear evidence that Stalin was in fact medically paranoid. > Many revolutions fail in this step and often end up facing a CIA backed coup. The USSR was under constant siege and attempted infiltration since inception. Can we please not recycle 90-years old soviet propaganda? The Moscow trial being irrational self-harm was acknowledged by the USSR leadership as early as the fifties…