hirako2000 2 hours ago

> Qwen 3.5 397B-A17B is a good comparison

It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.

That's why the difference between open router prices and those official providers isn't that different. Plus who knows what open routed providers do in term quantization. They may be getting 100x better efficiency, thus the competitive price.

That being said not all users max out their plan, so it's not like each user costs anthropic 5,000 USD. The hemoragy would be so brutal they would be out of business in months

▲

jychang 2 hours ago | parent | next [-]

That's a tautology. People think chinese models are 10x more efficient because they're 10x cheaper, and then you use that to claim that they're 10x more efficient.

Opus isn't that expensive to host. Look at Amazon Bedrock's t/s numbers for Opus 4.5 vs other chinese models. They're around the same order of magnitude- which means that Opus has roughly the same amount of active params as the chinese models.

Also, you can select BF16 or Q8 providers on openrouter.

▲

re-thc 26 minutes ago | parent [-]

> That's a tautology. People think chinese models are 10x more efficient because they're 10x cheaper

They do have different infrastructure / electricity costs and they might not run on nvidia hardware.

It's not just the models.

	▲	jychang 16 minutes ago \| parent [-]
		Except there are providers that serve both chinese models AND opus as well. On the same hardware. Namely, Amazon Bedrock and Google Vertex. That means normalized infrastructure costs, normalized electricity costs, and normalized hardware performance. Normalized inference software stack, even (most likely). It's about a close of a 1 to 1 comparison as you can get. Both Amazon and Google serve Opus at roughly ~1/2 the speed of the chinese models. Note that they are not incentivized to slow down the serving of Opus or the chinese models! So that tells you the ratio of active params for Opus and for the chinese models.

▲

Weaver_zhu 17 minutes ago | parent | prev | next [-]

Agree, but I guess the Opus 4.6 is 10x larger, rather than Chinese models being 10x more efficient. It is said that GPT-4 is already a 1.6T model, and Llama 4 behemoth is also much bigger than Chinese open-weight models. Chinese tech companies are short of frontier GPUs, but they did a lot of innovations on inference efficiency (Deepseek CEO Liang himself shows up in the author list of the related published papers).

	▲	jychang 11 minutes ago \| parent [-]
		No, Opus cannot be 10x larger than the chinese models. If Opus was 10x larger than the chinese models, then Google Vertex/Amazon Bedrock would serve it 10x slower than Deepseek/Kimi/etc. That's not the case. They're in the same order of magnitude of speed.

▲

simianwords 2 hours ago | parent | prev | next [-]

>It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.

I find it a good comparison because it is a good baseline since we have zero insider knowledge of Anthropic. They give me an idea that a certain size of a model has a certain cost associated.

I don't buy the 10x efficiency thing: they are just lagging behind the performance of current SOTA models. They perform much worse than the current models while also costing much less - exactly what I would expect. Current Qwen models perform as good as Sonnet 3 I think. 2 years later when Chinese models catchup with enough distillation attacks, they would be as good as Sonnet 4.6 and still be profitable.

▲

lelanthran 2 hours ago | parent | prev [-]

> That being said not all users max out their plan,

These are not cell phone plans which the average joe takes, they are plans purchased with the explicit goal of software development.

I would guess that 99 out of every 100 plans are purchased with the explicit goal of maxing them out.

▲

serial_dev 2 hours ago | parent | next [-]

I’m not maxing them out… I have issues that I need to fix, features I need to develop, and I have things I want to learn.

When I have a feeling that these tools will speed me up, I use them.

My client pays for a couple of these tools in an enterprise deal, and I suspect most of us on the team work like that.

If my goal was to max out every tool my client pays, I’d be working 24hrs a day and see no sunlight ever.

I guess it’s like the all you can eat buffet. Everybody eats a lot, but if you eat so much that you throw up and get sick, you are special.

	▲	bloppe 22 minutes ago \| parent [-]
		I am special

▲

Ginden an hour ago | parent | prev | next [-]

My employer bought me a Claude Max subscription. On heavy weeks I use 80% of the subscription. And among software engineers that I know, I'm a relatively heavy user.

Why? Because in my experience, the bottleneck is in shareholders approving new features, not my ability to dish out code.

▲

raihansaputra an hour ago | parent | prev | next [-]

goal? yeah. but in reality just timing it right (starting a session at 7-8am, to get 2 sessions in a workday, or even 3 if you can schedule something at 5am), i rarely hit limits.

if i hit the limit usually i'm not using it well and hunting around. if i'm using it right i'm basically gassed out trying to hit the limit to the max.

▲

solumunus an hour ago | parent | prev | next [-]

There’s absolutely no way that’s true.

▲

rustystump an hour ago | parent | prev [-]

In saas this is not true. Most saas is highly profitable or was i suppose because they knew that most of their customers would never max out their plans.