| ▲ | wongarsu 2 hours ago | |
The results would probably be underwhelming. The bitnet paper doesn't give great baselines to compare to, but in their tests a 2B network trained for 1.58bits using their architecture was better than Llama 3 8B quantized to 1.58bits. Though that 2B network was about on par with a 1.5B qwen2.5. If you have an existing network, making an int4 quant is the better tradeoff. 1.58b quants only become interesting when you train the model specifically for it On the other hand maybe it works much better than expected because llama3 is just a terrible baseline | ||