Remix.run Logo
Tepix 20 hours ago

$50K seems low if you want to run, say, GLM 5.2 4bit fast enough for a team for devs.

You need something like 6x RTX Pro 6000 at $11800 each plus a nice server (add $10000) = $80800 and then quite a bit of electricity.

theYipster 16 hours ago | parent [-]

You don't need all of the model in VRAM. 1 or 2 RTX Pro 6000s will do. $50K will get you there very nicely, and on a 1600 watt PSU if you go for the MAX-Q versions. (The same wattage PSU I'm typing this on, and have been using over the last 5 years.)

Tepix 3 hours ago | parent [-]

If you want decent performance (more than say 20 tokens/s) for your dev team, you absolutely do need all of the model in VRAM.