8 X RTX6000. It will run you around 80-100k to get started with a model at this size with decent tps..

Don't worry though, open source evangelists will tell you that these will be running on your phone in the next 3 years.

For $100k you could run this model 24/7 through open router with 10 concurrent sessions at 50tps for a decade and have money left over for a vacation. There's no point in investing this type of money in local models unless you have a business where you're already paying for many employee's individual token usage.

▲

Aurornis 2 hours ago | parent | next [-]

> 8 X RTX6000. It will run you around 80-100k to get started

8 x RTX6000 GPUs cost $100,000 alone. You then need to build a system that can support those GPUs with enough PCIe lanes through a PCIe switch.

It's going to be $120K to $150K to build or buy a system to run this.

▲

knollimar 44 minutes ago | parent | next [-]

isn't throwing that into a [insert financial vehicle that gives 99.99999% safe returns] going to destroy that when you factor in electricity costs?

Or even just electricity costs vs token cost

▲

CamperBob2 2 hours ago | parent | prev [-]

You can run the NV4FP quant with 8x RTX6000 cards at 50-75 tps output, but not (practically speaking) the OEM FP8 version. You will learn more about PCIe than you ever wanted to know.

The real gangstas are running 16x RTX6000s. Too rich for my blood, and the NV4FP quant doesn't seem to be that much worse.

▲

Sanzig 29 minutes ago | parent [-]

Anyone done any benchmarks on the NV4FP quant? Seriously considering pitching an 8 x RTX 6000 Pro box at work to run GLM-5.2 in an air gapped environment.

	▲	tiahura 5 minutes ago \| parent [-]
		Good luck. I’m in the legal field, and even there, selling airgapped is tough.

▲

krackers 2 hours ago | parent | prev | next [-]

Would you be better off pooling that money with some hackerspace group and then setting up shared inference infra, so that way you at least get better utilization?

▲

KaoruAoiShiho an hour ago | parent [-]

And before you know it, you invented some openrouter provider from first principles...

	▲	janalsncm 44 minutes ago \| parent [-]
		Right. For example you will need to figure out how to share it and who maintains it.

▲

8note 2 hours ago | parent | prev | next [-]

you can however, have fun with it.

oil workers buy 100k trucks they do not-much with. why not a 100k in computer?

▲

jliptzin 20 minutes ago | parent | next [-]

Yea as far has hobbies go, I feel like this is on the low end. I know people who collect watches and corvettes, that's way more expensive and functionally you can't really do anything special with them.

	▲	theteapot 4 minutes ago \| parent [-]
		The difference is watches and corvettes typically appreciate in value, where as computer hardware typically drops like a rock.

▲

Ken_At_EM 2 hours ago | parent | prev | next [-]

I can't help but ask where this comment came from, you must have some exposure..

▲

CamperBob2 2 hours ago | parent [-]

It is so easy to spend $100K on a pickup truck these days, it's not even funny.

	▲	tiahura 4 minutes ago \| parent \| next [-]
		A Honda minivan is > 50k.
	▲	SV_BubbleTime 25 minutes ago \| parent \| prev [-]
		Factory F350 Platinum is at least 90k sticker.

▲

afavour an hour ago | parent | prev | next [-]

Because car loans can’t be used to buy computers

	▲	ElProlactin 34 minutes ago \| parent [-]
		And there's your idea. If you could find a way to get people to add another $500/month over 80+ months to an auto loan, dealers would eat that up like filet mignon.

▲

dakolli 2 hours ago | parent | prev [-]

Sure, If you want to light money on fire for entertainment, more power to you. There's probably worse ways to light 100k on fire. If I have an extra 100k laying around it's going to my family though.

▲

InvertedRhodium an hour ago | parent | prev | next [-]

Depends how much you value privacy and running uncensored models.

Personally, I’m waiting for hardware to hit the secondary market before I buy something to run unquantized models like GLM. But I have no doubt that I will, at some point.

▲

KetoManx64 an hour ago | parent | prev | next [-]

As an individual I do not need the whole model. I don't need the model to have knowledge of the rain history of Algeria nor how many colors are in the Russian flag. Once they start trimming down the excess and making them field focused they will run just fine on people's individual devices.

▲

JumpCrisscross an hour ago | parent [-]

> I do not need the whole model. I don't need the model to have knowledge of the rain history of Algeria nor how many colors are in the Russian flag

Isn’t the performance gap between quantized and full models indicative that even if you aren’t using it directly, the model knowing the colors in the Russian flag does have something to do with the intelligence you demand?

	▲	KetoManx64 an hour ago \| parent \| next [-]
		Do quantized models specifically prune out specific knowledge? I think they just compress things down but they're still in there. You'd most likely need to do that when you're doing the initial model training, but I'm not expert.
	▲	kibwen an hour ago \| parent \| prev [-]
		Quantizing is one thing. But in general it's self-evident that training the model on information that is irrelevant to your use case does not necessarily improve ability, otherwise you'd have AGI just from reinforcing your model on memorizing the first 10^50 digits of pi. Likewise, LLMs do not violate the laws of information theory, and therefore the only way to encode X amount of information in Y amount of bits where X > Y is by performing what is effectively lossy compression, and as X grows larger relative to Y the compression ratio must change to lose ever more information. Yes, for the sake of making chatbots that are "conversational" in that they can interpret natural language as input and produce code as output you can easily benefit in incidental and unintuitive ways by training it on more natural language text. But for a given fixed parameter size, it's possible to produce a better model for a specific task by selectively not muddying its training set in the first place with things that are likely irrelevant to the task.

▲

rekttrader 2 hours ago | parent | prev | next [-]

Or you have data that HIPAA, GDPR, PII, or have to care about the concern of others training on your data.

	▲	dakolli 2 hours ago \| parent [-]
		That too.

▲

wonnage an hour ago | parent | prev | next [-]

Yeah, the neoclouds and hyperscalers are taking massive losses right now, self hosting is basically signing yourself up to do the same. There are philosophical reasons to do so but it’s a terrible economic decision

▲

dist-epoch 2 hours ago | parent | prev [-]

> 50tps for a decade

assuming demand doesn't keep on increasing. even google has trouble having enough capacity apparently.