Remix.run Logo
CamperBob2 5 hours ago

I think the 27B dense model at full precision and 122B MoE at 4- or 6-bit quantization are legitimate killer apps for the 96 GB RTX 6000 Pro Blackwell, if the budget supports it.

I imagine any 24 GB card can run the lower quants at a reasonable rate, though, and those are still very good models.

Big fan of Qwen 3.5. It actually delivers on some of the hype that the previous wave of open models never lived up to.

MarsIronPI 5 hours ago | parent [-]

I've had good experience with GLM-4.7 and GLM-5.0. How would you compare them with Qwen 3.5? (If you have any experience with them.)

CamperBob2 3 hours ago | parent [-]

No experience with 5 and not much with 4.7, but they both have quite a few advocates over on /r/localllama.

Unsloth's GLM-4.7-Flash-BF16.gguf is quite fast on the 6000, at around 100 t/s, but definitely not as smart as the Qwen 3.5 MoE or dense models of similar size. As far as I'm concerned Qwen 3.5 renders most other open models short of perhaps Kimi 2.5 obsolete for general queries, although other models are still said to be better for local agentic use. That, I haven't tried.