| ▲ | brentd 8 hours ago | |||||||
Regardless of whether the convergence is superficial or not, I am interested especially in what this could mean for future compression of weights. Quantization of models is currently very dumb (per my limited understanding). Could exploitable patterns make it smarter? | ||||||||
| ▲ | ACCount37 8 hours ago | parent [-] | |||||||
That's more of a "quantization-aware training" thing, really. | ||||||||
| ||||||||