Remix.run Logo
fxj 5 hours ago

Learning == Compression of information.

It can be a description by a shorter bit length. Think Shannon Entropy and the measure of information content. The information is still in the weights but it is reorganized and the reconstructed sentences (or lists of tokens) will not provide the same exact bits but the information is still there.

shawntan 4 hours ago | parent [-]

The compression is lossy.