▲ | _hark 19 hours ago | |
I agree with you in spirit. I just thought you might be interested in some of the technical details regarding the relationship between compression and generalization! I'll have a paper out next week which makes your point precise, using the language of algorithmic rate--distortion theory (lossy compression applied to algorithmic information theory + neural nets). I think another way of understanding this is through the "Value Equivalence Principle", which points out that if we are learning a model of our environment, we don't want to model everything in full detail, we only want to model things which affect our value function, which determines how we will act. The value function, in this sense, implies a distortion function that we can define lossy compression relative to. |