Regardless of whether the convergence is superficial or not, I am interested especially in what this could mean for future compression of weights. Quantization of models is currently very dumb (per my limited understanding). Could exploitable patterns make it smarter?

▲

ACCount37 8 hours ago | parent [-]

That's more of a "quantization-aware training" thing, really.

	▲	brentd 7 hours ago \| parent [-]
		[dead]