| ▲ | embedding-shape 5 hours ago | |||||||||||||||||||||||||||||||||||||||||||
> $10k gets you a Mac Studio with 512GB of RAM, which definitely can run GLM-4.7 with normal, production-grade levels of quantization (in contrast to the extreme quantization that some people talk about). Please do give that a try and report back the prefill and decode speed. Unfortunately, I think again that what I wrote earlier will apply: > In practice, it'll be incredible slow and you'll quickly regret spending that much money on it I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them. | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | rynn 4 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||
> Please do give that a try and report back the prefill and decode speed. M4 Max here w/ 128GB RAM. Can confirm this is the bottleneck. I weighed about a DGX Spark but thought the M4 would be competitive with equal RAM. Not so much. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | coder543 5 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||
> I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them. One RTX Pro 6000 is not going to be able to run GLM-4.7, so it's not really a choice if that is the goal. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||