Remix clone Hacker News

new | show | ask | jobs Github

	▲	packetlost 2 days ago
		This does not match my experience with 120B~ models. I run Qwen3.5 122b A10B on about 80GB of vRAM just fine.
	▲	bastawhiz a day ago \| parent [-]
		Qwen 3.5 is MoE. But you're also almost certainly running a quantized version. 120B is well over 200gb at bf16. With int4 you're looking at 60gb or so. Qwen uses relatively little kv (only about 2gb for 64k context). So you're not too snug, but if qwen isn't cutting it for you, as it didn't for me, you're kind of in a pickle. For writing tasks, int4 was simply too chaotic. I also couldn't get it to use tools. For me, qwen didn't cut it. You're not fine tuning a 120b parameter model with 80gb. You're probably not going to be able to abliterate it either, because it's moe. Other options use more vram, and where you'd have a fair amount of buffer with qwen, you're pressed with other big models.