| ▲ | simonw 2 hours ago | |
Anyone know how hard it would be to create a 1-bit variant of one of the recent Qwen 3.5 models? | ||
| ▲ | regularfry an hour ago | parent | next [-] | |
There are q2 and q1 quants, if you want an idea of how much performance you'd drop. Not quite the same implementation-wise, but probably equivalent in terms of smarts. | ||
| ▲ | nikhizzle 2 hours ago | parent | prev | next [-] | |
Almost trivial using open source tools, the question is how it performs without calibration/fine tuning. | ||
| ▲ | wongarsu 2 hours ago | parent | prev [-] | |
The results would probably be underwhelming. The bitnet paper doesn't give great baselines to compare to, but in their tests a 2B network trained for 1.58bits using their architecture was better than Llama 3 8B quantized to 1.58bits. Though that 2B network was about on par with a 1.5B qwen2.5. If you have an existing network, making an int4 quant is the better tradeoff. 1.58b quants only become interesting when you train the model specifically for it On the other hand maybe it works much better than expected because llama3 is just a terrible baseline | ||