Remix.run Logo
bArray an hour ago

Apparently GLM 5.2 is 753B parameters [1], what kind of hardware are people using to run this locally?

[1] https://huggingface.co/zai-org/GLM-5.2

crocowhile an hour ago | parent | next [-]

follow antirez - https://x.com/antirez/status/2071173841175363905?s=20

JamesSwift an hour ago | parent [-]

Thats quantized

kccqzy 13 minutes ago | parent | prev | next [-]

Run quantized versions. https://unsloth.ai/docs/models/glm-5.2

34 minutes ago | parent | prev | next [-]
[deleted]
dakolli an hour ago | parent | prev [-]

8 X RTX6000. It will run you around 80-100k to get started with a model at this size with decent tps..

Don't worry though, open source evangelists will tell you that these will be running on your phone in the next 3 years.

For $100k you could run this model 24/7 through open router with 10 concurrent sessions at 50tps for a decade and have money left over for a vacation. There's no point in investing this type of money in local models unless you have a business where you're already paying for many employee's individual token usage.

KetoManx64 a few seconds ago | parent | next [-]

As an individual I do not need the whole model. I don't need the model to have knowledge of the rain history of Algeria nor how many colors are in the Russian flag. Once they start trimming down the excess and making them field focused they will run just fine on people's individual devices.

Aurornis 32 minutes ago | parent | prev | next [-]

> 8 X RTX6000. It will run you around 80-100k to get started

8 x RTX6000 GPUs cost $100,000 alone. You then need to build a system that can support those GPUs with enough PCIe lanes through a PCIe switch.

It's going to be $120K to $150K to build or buy a system to run this.

CamperBob2 22 minutes ago | parent [-]

You can run the NV4FP quant with 8x RTX6000 cards at 50-75 tps output, but not (practically speaking) the OEM FP8 version. You will learn more about PCIe than you ever wanted to know.

The real gangstas are running 16x RTX6000s. Too rich for my blood, and the NV4FP quant doesn't seem to be that much worse.

krackers 36 minutes ago | parent | prev | next [-]

Would you be better off pooling that money with some hackerspace group and then setting up shared inference infra, so that way you at least get better utilization?

KaoruAoiShiho 4 minutes ago | parent [-]

And before you know it, you invented some openrouter provider from first principles...

InvertedRhodium 11 minutes ago | parent | prev | next [-]

Depends how much you value privacy and running uncensored models.

Personally, I’m waiting for hardware to hit the secondary market before I buy something to run unquantized models like GLM. But I have no doubt that I will, at some point.

8note an hour ago | parent | prev | next [-]

you can however, have fun with it.

oil workers buy 100k trucks they do not-much with. why not a 100k in computer?

afavour 10 minutes ago | parent | next [-]

Because car loans can’t be used to buy computers

Ken_At_EM 41 minutes ago | parent | prev | next [-]

I can't help but ask where this comment came from, you must have some exposure..

CamperBob2 21 minutes ago | parent [-]

It is so easy to spend $100K on a pickup truck these days, it's not even funny.

dakolli 43 minutes ago | parent | prev [-]

Sure, If you want to light money on fire for entertainment, more power to you. There's probably worse ways to light 100k on fire. If I have an extra 100k laying around it's going to my family though.

wonnage 9 minutes ago | parent | prev | next [-]

Yeah, the neoclouds and hyperscalers are taking massive losses right now, self hosting is basically signing yourself up to do the same. There are philosophical reasons to do so but it’s a terrible economic decision

rekttrader 44 minutes ago | parent | prev | next [-]

Or you have data that HIPAA, GDPR, PII, or have to care about the concern of others training on your data.

dakolli 41 minutes ago | parent [-]

That too.

dist-epoch 34 minutes ago | parent | prev [-]

> 50tps for a decade

assuming demand doesn't keep on increasing. even google has trouble having enough capacity apparently.