I'm curious if 1-bit params can be compared to 4- or 8-bit params. I imagine that 100B is equivalent to something like a 30B model? I guess only evals can say. Still, being able to run a 30B model at good speed on a CPU would be amazing.

▲

regularfry an hour ago | parent [-]

At some point you hit information limits. With conventional quantisation you see marked capability fall-off below q5. All else being equal you'd expect an N-parameter 5-bit quant to be roughly comparable to a 3N-parameter ternary, if they are trained to the same level, just in terms of the amount of information they can possibly hold. So yes, 100B ternary would be within the ballpark of a 30B q5 conventional model, with a lot of hand-waving and sufficiently-smart-training

	▲	cubefox 20 minutes ago \| parent [-]
		I assume that theoretically, 1-bit models could be most efficient because modern models switched from 32 bit to 16 bit to 8 bit per parameter (without quantization).