Remix.run Logo
pants2 5 hours ago

The Chinese models are only cheap on subsidized Chinese hosting. I have yet to find a USA-hosted Chinese model with a very clear value advantage over US models.

weitendorf 3 hours ago | parent | next [-]

There are basically two tiers of "Chinese models" in this context, the "edge" sized ones with ~30B parameters or less, and the big ~1T models that can basically only run in the datacenter.

I don't think it's as simple as saying China's hosting is subsidized, they have generally cheaper electricity and labor costs than in the US and don't have access to the top tier models, and a large internal market where the big models are the best thing they can run with what they have. So obviously they max out on their top models (which are trained with their hardware market in mind, not ours) and get the economy of scale from that, and can run generally the same hardware for less money than in the US because

The edge models are very cheap to run and can do so on inexpensive hardware. They are like 95% cheaper to run than Haiku, so the math is in their favor for certain batch workloads. Most people just run the models for themselves when they do that without making it available on openrouter or whatever, because you can just provision a gpu node and use it as needed, and it's not that expensive to run this family of models.

Is your problem that you want to call Chinese models hosted in the US because you're worried about the data handling?

pants2 3 hours ago | parent [-]

I obviously don't know the full economics of the Chinese-hosted models, but estimates[1] put the cost of hardware (servers + networking) at 70-80% of the total cost. Those things aren't meaningfully cheaper in China, so serving DeepSeek at 1/3 the cost of the cheapest US provider doesn't really compute unless it's heavily subsidized or we believe that Chinese engineers are just that much better at optimization.

Edge models, yes, they can be convenient to run batch jobs locally. I still would argue there's no economic benefit over paying for models. Haiku has a bad price/perf but others in that class are significantly cheaper in hosted APIs.

Doesn't matter what I think, the reality is that the majority of enterprises (where the real $ comes from) will not consider sending their data to China.

1. https://epoch.ai/data-insights/ai-datacenter-cost-breakdown

torginus 8 minutes ago | parent [-]

Hardware is arbitrarily priced, with the floor being as little money as it costs to make it, and the ceiling being how much competitors are willing to pay for it - the latter is much more of the driver of current pricing in the West than in China.

In a free market, the country would not matter, but Chinese models are often running on domestic hardware which does not directly compete with Nvidia GPUs and thus they can't get away charging as much for it.

wg0 4 hours ago | parent | prev | next [-]

No true. Also - put Deepseekv4 Flash on your local with effort set to "high" and you'll see that many many are using that model on their own machines without paying anyone anything.

Its just that some of us didn't imagine having GPUs would be advantageous and were not gamers on the side. Those who had beefy GPUs or GPU rigs for any reason, they rarely need to go anywhere else.

At least I am so impressed with Deepseekv4 AFTER using Claude Opus 4.7 for significant amount of time that I am not going anywhere but Deepseekv4.

The model is just INSANE. Things I have done with it include attempting to write a 2.5D game engine in C with full animation and map rendering layer by layer.

pants2 4 hours ago | parent [-]

You'll need to spend at least $20K on a workstation that can run DS4 Flash. It would take ages to reach that much in token spend at the speeds it runs at, and if you factor electricity costs you will likely never break even vs using API.

ekidd 5 hours ago | parent | prev | next [-]

The Chinese models are surprisingly cheap and performant sitting under my desk. Qwen3.6 27B is nowhere near as autonomous as Opus 4.7, but it runs in 24GB of VRAM. And it's actually great for the use cases where I'm going to carefully read and understand all the code anyway.

If you want to support a team of engineers, DeepSeek V4 Flash is antirez's current favorite. And you could support a team of engineers pretty nicely for $40-50k. Which might not make sense if you're on a Claude MAX 5x plan or the old enterprise group plan with fixed price seats. But Anthropic is switching their enterprise contracts over to token-based pricing, at which point $50k is looking pretty good.

__mharrison__ 5 hours ago | parent | prev | next [-]

Odd take. I'm running them locally at my desk (DGX Spark and 128GB MBP). They work fine for 90% of what most folks do. Admittedly, they do run slower on my hw than on the cloud.

pants2 5 hours ago | parent [-]

Running them locally is cool and has privacy/autonomy benefits, but you can't really make a value case for it. Guaranteed if you run the math you will never run enough inference to pay off your hardware vs buying tokens. Last time I ran the math on my MBP I'd have to run inference 24 hours a day for 5+ years to pay off the cost of my MBP, not accounting for electricity costs.

iooi 5 hours ago | parent | next [-]

Is this because of the tok/s? Since it's pretty easy to run up a $5k bill in API usage for Claude/ChatGPT in a month.

pants2 4 hours ago | parent [-]

Yes, because of the limits on tok/s, and you have to compare apples to apples, not Gemma 27B to Opus 4.7.

hedora 4 hours ago | parent [-]

Assuming the local models get the job done (e.g., you adjust your workflow so that you can run the local machine 100% all the time, or whatever), then the time to payback isn't very high. MSRP for a 128GB AMD was $1400 at launch. That's 7 months of claude code subscription. If you assume a 5 year depreciation cycle, you can buy a cluster of 8 such machines and still come out ahead. (Power is a few hundred watts per machine peak -- maybe 7 machines if you include electricity.) Of course, I'm assuming non-bubble numbers. Those boxes are like $3K now. Still, a normal person would probably not buy 8 of them at once. Instead, they'd space out buying a machine every few years as the technology improves.

For me, things are getting better faster than my ability to review / trust the resulting code, so tok/sec isn't a bottleneck anymore. Instead, quality of the tokens is the bottleneck. That points to me wanting a 1TB DRAM iGPU once they're available at pre-bubble RAM pricing.

pants2 4 hours ago | parent [-]

You're comparing the highest tier Claude subscription to something Qwen3.5-122B-A10B running locally, apples to oranges.

If you compare to a smarter US model like Grok 4.3, $1400 will pay for 560M output tokens, which at ~25 t/s locally using it nonstop for 8 hours a day would take two years to pay back. Not accounting for bubble prices or electricity.

__mharrison__ 2 hours ago | parent [-]

Is the goal maximum t/s?

According to openrouter, Opus 4.8 is 128 t/s. So 10x faster than my antirez/ds4.

slopinthebag 3 hours ago | parent | prev | next [-]

The value of not having a reliance on a third party company, and not needing an internet connection, and having total privacy: ∞

fragmede 2 hours ago | parent | prev [-]

Just have to put some numbers on privacy and autonomy. What's the fine to my company if I get hacked and leak all my customer's PII? What's the cost in productivity lost if OpenAI/Anthropic/Google decides to suspend my account for an unknown reason?

harsh3195 4 hours ago | parent | prev | next [-]

You can find them on Deepinfra. Palo Alto company. Similar cheap price.

pants2 4 hours ago | parent [-]

Not similar. DeepInfra[1] has DS4 Pro pricing at $1.30/$2.60 which is 3X the Deepseek[2] (Chinese) hosting at $0.435/$0.87. DeepInfra is also very slow at 37 t/s and uses an FP4 quant[3], so intelligence will be degraded slightly.

Meanwhile you could use Grok 4.3 for the same price which is smarter and 5X faster[4].

1. https://deepinfra.com/pricing

2. https://api-docs.deepseek.com/quick_start/pricing

3. https://artificialanalysis.ai/models/deepseek-v4-pro/provide...

4. https://artificialanalysis.ai/models/grok-4-3

wirybeige 2 hours ago | parent [-]

DS4 Pro/Flash were post trained with QAT, so they are already quantized to FP4 for the most part. That's why when downloading the weights, they are much smaller than what their weights at fp8 or fp16 would be. For example, Flash is a 284B model, but its GB size is only ~160GB. OFC maybe DeeppInfra went even further, but there is no proof of that.

slopinthebag 3 hours ago | parent | prev [-]

Huh? They're several times cheaper than SOTA models at market rate prices.

pants2 2 hours ago | parent [-]

If you are only looking at US hosting providers, models from US labs easily meet or beat models from Chinese labs on the same intelligence level. I'm not comparing DeepSeek with Opus because those are on different levels of performance.

slopinthebag 2 hours ago | parent [-]

Deepseek v4 Pro on US hosting is like 1.5x cheaper and 5x cheaper on input/output compared to Sonnet, and that's not even a fair comparison because Deepseek is much stronger than Sonnet. It's more reasonable to compare with Opus 4.5, which is much more expensive.