| ▲ | trollbridge a day ago | |
GLM-5.2 performing like it would from a good provider - 8x B200s, so $450k. (No personal experience here) GLM-5.2, severely quantised, 512GB Mac Studio, somewhere between $10k-$35k for a used M3. Or run it on a CPU with 768GB of RAM by getting an old PowerEdge with DDR4 for around $5,000. Qwen-3.6-35b-q6, runs well on an RTX 5090 ($4000 + cost of a PC), runs medicore on an Intel Arc B70 ($1000 + cost of a PC plus lots of fiddling to get the setup to work right). Gemma is a good candidate for the cheaper stuff, but I lack personal experience with using it locally | ||
| ▲ | trollbridge 14 hours ago | parent | next [-] | |
For anyone reading this, GLM-5.2 is actually a lot more accessible than that on the 1 or 2 bit quantised models - see https://unsloth.ai/docs/models/glm-5.2 Basically a 1x 24GB GPU (32GB would be better) plus 256GB of free system RAM, or a 256GB unified memory machine (like a Mac). Kind of shocked they got the results they did. | ||
| ▲ | csomar 5 hours ago | parent | prev [-] | |
Holy AI, this shit is expensive. I was a bit suspect (no experience too) so I run some Claude calculations and it's also giving me a $350-450k to run GLM-5.2 at full precision (un-quantized). For rental on Azure, it's giving me $96–$144/hr. That translates to $22/M tokens which way more expensive than API pricing at z.ai. To get close to API pricing, you have to seek cheaper providers but that only gets you close to z.ai pricing not lower. Caveat here is that all of this is Claude math, but would be interested in someone more knowledgeable of the math chiming in. I was thinking that API pricing was highly inflated in order to cover subscription costs but with these calculations it might be not? | ||