No, but businesses do. Being able to run quality LLMs without your business, or business's private information, being held at the mercy of another corp has a lot of value.

▲

forrestthewoods 2 days ago | parent | next [-]

What type of system is needed to self host this? How much would it cost?

▲

fragmede 2 days ago | parent | next [-]

One GB200 NVL72 from Nvidia would do it. $2-3 million, or so. If you're a corporation, say Walmart or PayPal, that's not out of the question.

If you want to go budget corporate, 7 x H200 is just barely going to run it, but all in, $300k ought to do it.

▲

gloflo 2 days ago | parent [-]

How many users can you serve with that?

▲

fragmede 2 days ago | parent [-]

For the H200, between 150-700. The GB200 gets you something like 2-10k users.

	▲	forrestthewoods 2 days ago \| parent [-]
		Whoa. How on earth can one system serve 2000 potentially concurrent users?

▲

disiplus 2 days ago | parent | prev | next [-]

Depends how many users you have and what is "production grade" for you but like 500k gets you a 8x B200 machine.

▲

p1esk 2 days ago | parent | prev | next [-]

Depends on fast you want it to be. I’m guessing a couple of $10k mac studio boxes could run it, but probably not fast enough to enjoy using it.

▲

CamperBob2 2 days ago | parent | prev [-]

$20K worth of RTX 6000 Blackwell cards should let you run the Flash version of the model.

▲

choldstare 2 days ago | parent | prev [-]

Not really - on prem llm hosting is extremely labor and capital intensive

▲

applfanboysbgon 2 days ago | parent [-]

But can be, and is, done. I work for a bootstrapped startup that hosts a DeepSeek v3 retrain on our own GPUs. We are highly profitable. We're certainly not the only ones in the space, as I'm personally aware of several other startups hosting their own GLM or DeepSeek models.

	▲	wuschel 2 days ago \| parent [-]
		Why a retrain? What are you using the model for?