Remix.run Logo
yieldcrv 2 days ago

They aren’t, there is a 1.58 version of deepseek that’s like 200gb instead of 700

logicchains 2 days ago | parent [-]

That's not a real BitNet, it's just a post-training quantisation, and its performance suffers compared to if it was trained from scratch at 1.58 bits.