▲ | GuB-42 3 days ago | |||||||||||||||||||||||||
There are a lot of parallels between AI and compression. In fact the best compression algorithms and LLMs have in common that they work by predicting the next word. Compression algorithms take an extra step called entropy coding to encode the difference between the prediction and the actual data efficiently, and the better the prediction, the better the compression ratio. What makes a LLM "lossy" is that you don't have the "encode the difference" step. And yes, it means you can turn a LLM into a (lossless) compression algorithm, and I think a really good one in term of compression ratio on huge data sets. You can also turn a compression algorithm like gzip into a language model! A very terrible one, but the output is better than a random stream of bytes. | ||||||||||||||||||||||||||
▲ | jparishy 3 days ago | parent | next [-] | |||||||||||||||||||||||||
I suspect this ends up being pretty important for the next advancements in AI, specifically LLM-based AI. To me, the transformer architecture is a sort of compression algorithm that is being exploited for emergent behavior at the margins. But I think this is more like stream of consciousness than premeditated thought. Eventually I think we figure out a way to "think" in latent space and have our existing AI models be just the mouthpiece. In my experience as a human, the more you know about a subject, or even the more you have simply seen content about it, the easier it is to ramble on about it convincingly. It's like a mirroring skill, and it does not actually mean you understand what you're saying. LLMs seem to do the same thing, I think. At scale this is widely useful, though, I am not discounting it. Just think it's an order of magnitude below what's possible and all this talk of existing stream-of-consciousness-like LLMs creating AGI seems like a miss | ||||||||||||||||||||||||||
▲ | layer8 3 days ago | parent | prev | next [-] | |||||||||||||||||||||||||
One difference is that compression gives you one and only one thing when decompressing. Decompression isn't a function taking arbitrary additional input and producing potentially arbitrary, nondeterministic output based on it. We would have very different conversations if LLMs were things that merely exploded into a singular lossy-expanded version of Wikipedia, but where looking at the article for any topic X would give you the exact same article each time. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
▲ | arjvik 3 days ago | parent | prev [-] | |||||||||||||||||||||||||
With a handy trick called arithmetic coding, you can actually turn an LLM into a lossless compression algorithm! | ||||||||||||||||||||||||||
|