| ▲ | espadrine 6 hours ago |
| Two aspects to consider: 1. Chinese models typically focus on text. US and EU models also bear the cross of handling image, often voice and video. Supporting all those is additional training costs not spent on further reasoning, tying one hand in your back to be more generally useful. 2. The gap seems small, because so many benchmarks get saturated so fast. But towards the top, every 1% increase in benchmarks is significantly better. On the second point, I worked on a leaderboard that both normalizes scores, and predicts unknown scores to help improve comparisons between models on various criteria: https://metabench.organisons.com/ You can notice that, while Chinese models are quite good, the gap to the top is still significant. However, the US models are typically much more expensive for inference, and Chinese models do have a niche on the Pareto frontier on cheaper but serviceable models (even though US models also eat up the frontier there). |
|
| ▲ | coliveira 5 hours ago | parent | next [-] |
| Nothing you said helps with the issue of valuation. Yes, the US models may be better by a few percentage points, but how can they justify being so costly, both operationally as well as in investment costs? Over the long run, this is a business and you don't make money being the first, you have to be more profitable overall. |
| |
| ▲ | ben_w 5 hours ago | parent | next [-] | | I think the investment race here is an "all-pay auction"*. Lots of investors have looked at the ultimate prize — basically winning something larger than the entire present world economy forever — and think "yes". But even assuming that we're on the right path for that (which we may not be) and assuming that nothing intervenes to stop it (which it might), there may be only one winner, and that winner may not have even entered the game yet. * https://en.wikipedia.org/wiki/All-pay_auction | | |
| ▲ | coliveira 5 hours ago | parent [-] | | > investors have looked at the ultimate prize — basically winning something larger than the entire present world economy This is what people like Altman want investors to believe. It seems like any other snake oil scam because it doesn't match reality of what he delivers. | | |
| |
| ▲ | ycombigrator 4 hours ago | parent | prev [-] | | [dead] |
|
|
| ▲ | jodleif 6 hours ago | parent | prev | next [-] |
| 1. Have you seen the Qwen offerings? They have great multi-modality, some even SOTA. |
| |
| ▲ | brabel 6 hours ago | parent [-] | | Qwen Image and Image Edit were among the best image models until Nano Banana Pro came along. I have tried some open image models and can confirm , the Chinese models are easily the best or very close to the best, but right now the Google model is even better... we'll see if the Chinese catch up again. | | |
| ▲ | BoorishBears 3 hours ago | parent [-] | | I'd say Google still hasn't caught up on the smaller model side at all, but we've all been (rightfully) wowed enough by Pro to ignore that for now. Nano Banano Pro starts at 15 cents per image at <2k resolution, and is not strictly better than Seedream 4.0: yet the latter does 4K for 3 cents per image. Add in the power of fine-tuning on their open weight models and I don't know if China actually needs to catch up. I finetuned Qwen Image on 200 generations from Seedream 4.0 that were cleaned up with Nano Banana Pro, and got results that were as good and more reliable than either model could achieve otherwise. | | |
| ▲ | dworks 43 minutes ago | parent [-] | | FWIW, Qwen Z-Image is much better than Seedream and people (redditors) are saying its better than Nano Banana in their first trials. Its also 7B I think, and open. | | |
| ▲ | BoorishBears 20 minutes ago | parent [-] | | I've used and finetuned Z-Image Turbo: it's nowhere near Seedream or even Qwen-Image when the latter is finetuned (also doesn't do image editing yet) It is very good for the size and speed, and I'm excited for the Edit and Base variants... but Reddit has been a bit "over-excited" because it run on their small GPUs and isn't overly resistant to porn. |
|
|
|
|
|
| ▲ | raincole 5 hours ago | parent | prev | next [-] |
| > video Most of AI-generated videos we see on social media now are made with Chinese models. |
|
| ▲ | agumonkey 5 hours ago | parent | prev | next [-] |
| forgive me for bringing politics into it, are chinese LLM more prone to censorship bias than US ones ? |
| |
| ▲ | coliveira 5 hours ago | parent | next [-] | | Being open source, I believe Chinese models are less prone to censorship, since the US corporations can add censorship in several ways just by being a closed model that they control. | |
| ▲ | erikhorton an hour ago | parent | prev | next [-] | | Yes extremely likely they are prone to censorship based on the training. Try running them with something like LM Studio locally and ask it questions the government is uncomfortable about. I originally thought the bias was in the GUI, but it's baked into the model itself. | |
| ▲ | skeledrew 5 hours ago | parent | prev [-] | | It's not about a LLM being prone to anything, but more about the way a LLM is fine-tuned (which can be subject to the requirements of those wielding political power). | | |
|
|
| ▲ | torginus 6 hours ago | parent | prev [-] |
| Thanks for sharing that! The scales are a bit murky here, but if we look at the 'Coding' metric, we see that Kimi K2 outperforms Sonnet 4.5 - that's considered to be the price-perf darling I think even today? I haven't tried these models, but in general there have been lots of cases where a model performs much worse IRL than the benchmarks would sugges (certain Chinese models and GPT-OSS have been guilty of this in the past) |
| |
| ▲ | espadrine 3 hours ago | parent [-] | | Good question. There's 2 points to consider. • For both Kimi K2 and for Sonnet, there's a non-thinking and a thinking version.
Sonnet 4.5 Thinking is better than Kimi K2 non-thinking, but the K2 Thinking model came out recently, and beats it on all comparable pure-coding benchmarks I know: OJ-Bench (Sonnet: 30.4% < K2: 48.7%), LiveCodeBench (Sonnet: 64% < K2: 83%), they tie at SciCode at 44.8%. It is a finding shared by ArtificialAnalysis: https://artificialanalysis.ai/models/capabilities/coding • The reason developers love Sonnet 4.5 for coding, though, is not just the quality of the code. They use Cursor, Claude Code, or some other system such as Github Copilot, which are increasingly agentic. On the Agentic Coding criteria, Sonnet 4.5 Thinking is much higher. By the way, you can look at the Table tab to see all known and predicted results on benchmarks. |
|