Remix.run Logo
Aurornis 7 hours ago

> I did not do a very long session

This is always the problem with the 2-bit and even 3-bit quants: They look promising in short sessions but then you try to do real work and realize they’re a waste of time.

Running a smaller dense model like 27B produces better results than 2-bit quants of larger models in my experience.

amelius 3 hours ago | parent | next [-]

> This is always the problem with the 2-bit and even 3-bit quants: They look promising in short sessions but then you try to do real work and realize they’re a waste of time.

It would be nice to see a scientific assessment of that statement.

singpolyma3 5 hours ago | parent | prev [-]

Lots of people seem to use 4bit. Do you think that's worth it vs a smaller model in some cases?

Aurornis 4 hours ago | parent | next [-]

4 bit is as low as I like to go. There are KLD and perplexity tests that compare quantizations where you can see the curve of degradation, but perplexity and KLD numbers can be misleading compared to real world use where small errors compound over long sessions.

In my anecdotal experience I’ve been happier with Q6 and dealing with the tradeoffs that come with it over Q4 for Qwen3.5 27B.

hnfong 5 hours ago | parent | prev [-]

Generally the perplexity charts indicate that quality drops significantly below 4-bit, so in that sense 4-bit is the sweet spot if you're resource constrained.