Remix.run Logo
WithinReason 5 hours ago

I'm running Qwen3.6-27B on a single 24GB GPU at 80 tok/s, you don't even need 2 of them

npodbielski 3 hours ago | parent [-]

Yeah but 4 bits very often loops needlessly. Which is not that bad because you do not pay for tokens. But you paid for hardware and you want use it for something useful. Q6 is better but then you have like 40t/s prefill. Which is really tiring. But at least it says sorry when you ask it what is wrong! I heard there is some extension for PI preventing that. I need to look into it. Otherwise I am quite happy.

Zambyte 2 hours ago | parent | next [-]

"Very often" sounds like a lot more than I would say. I've been using Qwen 3.6 27b Q4 in Pi (with out any anti-looping extension) daily for weeks now, and I've had it get stuck in an infinite loop maybe 3 or 4 times.

Der_Einzige 2 hours ago | parent | prev [-]

You can fix looping with proper repetition penalties. Turn on the one called “DRY” that PeW invented and got merged into llama cpp