For large corporates and other entities of any size, the threat of the core of your infrastructure getting suddenly disabled because of something like this is going to be untenable. I predict the pressures for on-prem, offline access (whether by licensing weights or getting them in a restricted setting like TEE/CC) will be overwhelming and one the players will fill the need.

▲

dansquizsoft 5 hours ago | parent | next [-]

Thinking that on prem models will be a halfway decent solution against what can be served out of a data center is a fools take... One that is more common than it should be on here...

▲

wolttam 5 hours ago | parent | next [-]

The point is not to be as good as the multi-trillion parameter model you can host in across 72 GPUs (or whatever).

I'm running a 248B model on a paltry amount of hardware and getting plenty of good use out of it.

Sure, the most demanding tasks will demand the best models (and always will). There's still less demanding tasks for other models.

I think some people are fooling themselves that coding of all tasks is always going to requires the biggest models ever. Again, maybe some coding tasks will, but the majority of business CRUD apps probably don't. Same goes for virtually any other type of task. The biggest models are really only useful for the most complex tasks.

▲

sgc 4 hours ago | parent [-]

If you wouldn't mind, could you explain a bit what the 248B model is good for, and where it breaks down and you need something better? I hear this take often, but it is always a fleeting remark so I have no idea what the 'useful' looks like - at all.

	▲	ihateolives an hour ago \| parent \| next [-]
		In my experience they require much more hand holding and more specific directions with less possibilities to interpret a command in several ways. You do the planning, keep on eye on that they're producing and they do the legwork. It's not that their knowledge of Java or PHP or what have you is lacking, it's the long horizon planning that you have to do yourself. Technically they're good. You just have to do more thinking and more reviewing yourself. YMMV.
	▲	wolttam 2 hours ago \| parent \| prev \| next [-]
		To answer this and my sibling, it's DeepSeek V4 Flash at native FP4 quantization, on two Nvidia DGX Sparks. Which is a bit of kit but still paltry relative to the data centre. ~40 TPS generation, ~2000 TPS prompt processing, which makes it feel approximately as fast as typical APIs. I primarily use it with my own harness for coding. I'm not going to say it will compete with Opus in the most challenging domains, because it won't, but I will say that there's a reasonable likelihood that Opus is used for tasks that a model like Flash could comfortably handle at 1/100th the cost. So far I've only seen it struggle at tasks that I myself would struggle with. Tasks that I can describe the shape of the solution for, it has a high success rate at implementing. Useful is going to be different for everyone. I'm not working on the hardest problems, I don't need the best models.
	▲	rhipitr 3 hours ago \| parent \| prev [-]
		Depending on quantization I figure they need at least a p4 and likely a p5 EC2 (or similar instance in another provider) for a model with that many parameters. Maybe they are hosting on bare metal but I imagine not. Those instance types (assuming not using spot) are quite expensive to run.

▲

upbeat_general 4 hours ago | parent | prev [-]

If we’re defining on-prem as fitting in a rack - then every frontier model can be hosted on-prem.

Now this might not be the most cost effective (and may require a bit extra power), but you only need a datacenter for training or cost optimization.

▲

WarOnPrivacy 5 hours ago | parent | prev | next [-]

> I predict the pressures for on-prem, offline access ... will be overwhelming and one the players will fill the need.

I'd agree except that Big AI has made sure that most of us can't afford the hardware (RAM, NVMe, etc) to run it.

	▲	Folcon 5 hours ago \| parent [-]
		Honestly at this point I'm not sure how much that matters?

▲

stevarino 5 hours ago | parent | prev | next [-]

This is ignoring the fact that the government is the foundation of society (I know some will disagree with that, but the end result is just government with more steps).

Private models in a low trust society means the government will come and seize the models. Competitive business will only be allowed through cronyism.

The better option is to opt for high trust. Yes the Gman can rip your servers apart, but they know they'll face consequences, legal and political. Laws and regulations are the answer, not locking down into smaller fiefdoms.

▲

senderista 4 hours ago | parent [-]

You get high trust through social norms, not by more "laws and regulations". Social norms can't be imposed by fiat, they arise spontaneously, often for unclear reasons. That's why they're so fragile and precious. With Trump's destruction of social norms around the presidency and the federal government generally, the US is now just another country where bribery is the cost of doing business.

	▲	iamnothere 4 hours ago \| parent [-]
		Through social norms and through policies that ensure the public on average feels prosperous and secure.

▲

sgrove 6 hours ago | parent | prev | next [-]

Likely many points along the pareto frontier.

Some will take greater risks and win (or lose); others will play it safer and slowly accumulate wins (or be obsoleted).

Never mind the threat of letting these models write code that runs your business, or operate it agentically. Models trained by actors (corporate or nationstate) diametrically opposed to your interests.

Lots to take into account now, interesting time to be in business.

▲

bryzio 4 hours ago | parent | prev | next [-]

Or abstract i.e. openrouter, that reduces the risk vector to "all implementations have been simultaneously banned".

If a government entity bans a LLM provider due to a jailbreak concern, they can also ban an on-prem solution under the same guise. The jailbreak risk exists regardless of where it's hosted. You could defensibly argue the on-prem risk is higher since frontier model companies can justify safety spend due to their size, it's more difficult to combat bad actors if you're company is the only one using the model and you don't have economies of scale.

▲

yogthos 5 hours ago | parent | prev | next [-]

This is precisely why I expect that Chinese open models are going to win in the long run. The capability difference isn't dramatic in the grand scheme of things, but the fact that you can run your own is a huge selling point. Even if you rent an open model from a Chinese company, you can switch to on prem if they decided to yank access or change terms in the way you don't like. It might be a pain, but it wouldn't be existential. On the other hand, if you become dependent on a closed model and it gets yanked then you're in a world of hurt.

And infrastructure dominance is really the big picture here. Chinese models are going to become the standard setters because they're going to be what people are using. That means more research, more tooling, and a whole ecosystem developing around them.

And that was already starting to happen even before this fiasco with Chinese models now being the most used ones globally. https://www.indiatoday.in/amp/technology/features/story/clau...

▲

UncleOxidant 5 hours ago | parent [-]

After this action, I have no doubt that this administration will try to ban Chinese models. Of course, doing so will be futile, we'll figure out ways to get around it, but now I'm pretty sure they're going to try.

▲

yogthos 5 hours ago | parent [-]

I'm waiting for that to happen as well since the price difference makes it very difficult for companies like Anthropic and OpenAI to compete. And we already have precedent for this with stuff like EVs, phones, and so on. As soon as Chinese companies start making a product that's more popular, they get banned on some national security pretext.

The tricky part with banning Chinese models is that they're open. It'll be easy to ban access to service providers, but preventing people from running these models on prem is going to be really tough. Like are they going to go after Cursor for example given that their model is based on Kimi?

I very much agree it's going to be a futile endeavour in the end. It kind of reminds me of the time Microsoft tried to get Linux and open source banned when Linux started encroaching on Windows server market. This is going to end the same way.

	▲	UncleOxidant 5 hours ago \| parent [-]
		I'm going to guess they'll go after sites like Huggingface that host downloads. I suspect we'll be torrenting Chinese models in the not-too-distant future. Or we'll have to geo-spoof with VPN to download from other countries.

▲

AbstractH24 4 hours ago | parent | prev | next [-]

Why? None of the various cloud provider outages ever have.

▲

duped 5 hours ago | parent | prev [-]

[flagged]

	▲	hackmack10 3 hours ago \| parent [-]
		Great point. That is what all the Fortune 500 CEO's are frothing at the mouth about. Having LLM's replace their payroll. So yeah, they deserve to fail.