| ▲ | londons_explore 3 hours ago | |||||||
How is the research on training these models directly in their quantized state going? That'll be the real game changer. | ||||||||
| ▲ | sigmoid10 2 hours ago | parent | next [-] | |||||||
The original BitNet was natively trained on 1.58 bits. PrismML has not released any actual info on how they trained, but since they are based on Qwen, there was certainly some downstream quantization involved. | ||||||||
| ||||||||
| ▲ | cubefox 44 minutes ago | parent | prev [-] | |||||||
This is the only paper which really does this: https://proceedings.neurips.cc/paper_files/paper/2024/hash/7... They train directly in the 1 bit domain, without any floating point weights. They don't use the classical Newton-Leibniz derivative (which operates on approximations of real numbers) for gradient descent / backpropagation. Instead they invented a binary version called "Boolean variation". I don't know why this paper didn't get more attention. | ||||||||