| ▲ | XenophileJKO 3 hours ago | |
It's really way more interesting that that. The act of compression builds up behaviors/concepts of greater and greater abstraction. Another way you could think about it is that the model learns to extract commonality, hence the compression. What this means is because it is learning higher level abstractions AND the relationships between these higher level abstractions, it can ABSOLUTELY learn to infer or apply things way outside their training distribution. | ||