▲ | _hark 2 days ago | ||||||||||||||||
Say we have some dataset composed of D bytes. Next, say I find some predictive model of the data M, where M is composed of N bytes. Furthermore, let us say that the entropy of the dataset under the model is H bytes. Then, if N + H < D, my model compresses the data. It doesn't matter if the model is deterministic or probabilistic: a probabilistic model can be used to (losslessly) compress a dataset with entropy coding. One more argument for compression being equivalent to intelligence: Across many fields of statistical machine learning, there are generalization bounds which have the form: test error <= train error + model complexity That is, we don't expect to do any worse on new (test) data, than the sum of the train error and the model complexity (smallest compressed size). Notice that if you interpret the train error as the entropy of the data (i.e. error under a cross entropy loss), then the models which satisfy the statistical generalization bound correspond to those which best compress the data. In other words: the model which produces the shortest description of the data is the one which is expected to generalize best. | |||||||||||||||||
▲ | vlovich123 20 hours ago | parent | next [-] | ||||||||||||||||
> It doesn't matter if the model is deterministic or probabilistic: a probabilistic model can be used to (losslessly) compress a dataset with entropy coding. But if you can choose to lose information you can obviously achieve a higher compression score. That's literally what optical & auditory compression exploits. Indeed, we know people generally don't memorize the entire Wikipedia article. Rather they convert what they learn into some internally consistent story that they can then recite at any time & each time they recite it it's even differently worded (maybe memorizing some facts that help solidify the story). Again, I have no problem with compression and decompression being equated to intelligence provided both are allowed to be lossy (or at least one facet of intelligence). That's because you get to inject structure into the stored representation that may not otherwise exist in the original data and you get to choose how to hydrate that representation. That's why LZMA isn't "more intelligent" than ZIP - the algorithm itself is "smarter" at compression but you're not getting to AGI by working on a better LZMA. It's also why H264 and MP3 aren't intelligent either. While compression is lossy decompression is deterministic. That's why we can characterize LLMs as "more intelligent" than LZMA even though LZMA compresses losslessly better than LLMs. | |||||||||||||||||
| |||||||||||||||||
▲ | dooglius 2 days ago | parent | prev [-] | ||||||||||||||||
This seems to assume that there is a tractable way to encode H efficiently, but this seems very difficult given a model that is focused on understanding the content. Ex: I can easily write a program that can do basic arithmetic, but given say a bitmap scan of elementary school math materials, such a program gives me no easy way to compress that; rather something generic like PNG (that does not know or understand the material) will far outperform. | |||||||||||||||||
|