Agree, but I guess the Opus 4.6 is 10x larger, rather than Chinese models being 10x more efficient. It is said that GPT-4 is already a 1.6T model, and Llama 4 behemoth is also much bigger than Chinese open-weight models. Chinese tech companies are short of frontier GPUs, but they did a lot of innovations on inference efficiency (Deepseek CEO Liang himself shows up in the author list of the related published papers).

▲

jychang 2 hours ago | parent | next [-]

No, Opus cannot be 10x larger than the chinese models.

If Opus was 10x larger than the chinese models, then Google Vertex/Amazon Bedrock would serve it 10x slower than Deepseek/Kimi/etc.

That's not the case. They're in the same order of magnitude of speed.

	▲	bakugo an hour ago \| parent [-]
		I agree that Opus almost definitely isn't anywhere near that big, but AWS throughput might not be a great way to measure model size. According to OpenRouter, AWS serves the latest Opus and Sonnet at roughly the same speed. It's likely that they simply allocate hardware differently per model.

▲

bakugo an hour ago | parent | prev [-]

GPT-4 was likely much larger than any of the SOTA models we have today, at least in terms of active parameters. Sparse models are the new standard, and the price drop that came with Opus 4.5 made it fairly obvious that Anthropic are not an exception.