Remix.run Logo
brentd 8 hours ago

Regardless of whether the convergence is superficial or not, I am interested especially in what this could mean for future compression of weights. Quantization of models is currently very dumb (per my limited understanding). Could exploitable patterns make it smarter?

ACCount37 8 hours ago | parent [-]

That's more of a "quantization-aware training" thing, really.

brentd 7 hours ago | parent [-]

[dead]