Remix.run Logo
linzhangrun 2 hours ago

deepseek-v4-pro, probably the representative cheap opensouce LLM, was released in 2026.4 One year before, what OAI had in hand was gpt-4.1 and gpt-o3. I think it is not very controversial to say that deepseek is stronger than them, at most you can point to some post-training problems, basically the instability you mentioned. Also I am not sure if it is because the people who are best at using AI -- the people making AI -- get more development speed as the models get smarter, but my feeling is model progress is getting faster and faster. GPT-3.5 and GPT-4 were almost one year apart. The disadvantage from hardware limits and compute shortage is visible from the size of chinese models. glm-5.2, which is claimed to be around opus-4.6 level in coding, is only 744B. But Chinese engineers are obviously, how to put it, getting very effective results on "performance at the same size". And that is not even talking about the advantages from China's electricity, manpower, or even "national will" to compete against America. So saying it may take three years to catch up with a gap that is now only several months looks too pessimistic. ChatGPT itself was released only three and a half years ago, and today is already a completely different world.

sho 2 hours ago | parent [-]

You may be right, and I certainly hope so!

But the question was about whether the Chinese labs will have fable-equivalence in 1 year. I am by no means some kind of insider, but knowing the vaguest outlines of what went into Mythos, they just can't do it. The compute is not there. The Chinese engineers are incredible, but they're not literal magicians.

Of course there could be something incredible to come out of left field and overturn the apple cart yet again, but that's speculation. It would be awesome, sure! But I wouldn't bet too heavily on it.

And FWIW - again, no disrespect at all to the Chinese engineers but I don't rate GLM5.2 as being even close to opus 4.6. It can hit a few benchmarks, sure, that's the top edge of the "jag". But filling in the rest of the capabilities - again, it takes compute and data the OSS labs just don't have, that anyone knows about at least.