| ▲ | MisterTea 7 hours ago | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is something I have been curious about in terms of how an LLM's achieves compression. I would like to know what deviations are in the output as this almost feels like a game of telephone where each re-compression results in a loss of data which is then incorrectly reconstructed. Sort of like misremembering a story and as you tell it over time the details change slightly. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Scaevolus 7 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
When LLMs predict the next token, they actually produce a distribution of the probability of each of the possible next tokens, and the sampler chooses one of them, and not necessarily the most likely one! If instead you run LLM prediction and then encode the probability of the next token of the input text you want to encode (from the cumulative distribution, a number in [0, 1]) using arithmetic coding, you can run the same operation in reverse to achieve lossless compression. The tricky part is ensuring that your LLM executes absolutely deterministically, because you need to make sure that the encoder and decoder have the same probability distribution map at each step. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | AnotherGoodName 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yes. The secret is in understanding arithmetic coding. https://en.wikipedia.org/wiki/Arithmetic_coding Arithmetic coding takes a prediction of the next bit and writes out exactly as many bits as needed to correct that prediction. The amazing part is that you can write out fractional bits. Eg. You predict the next bit is '1' with 75% probability? If it is 1 you only need to write out 1/2 of a bit (correcting that 25% portion). If it's 0 you need to write out 2.4bits. It may seem strange to work with 1/2 a bit but it works! (essentially the other half of the bit represents other future correction required). You might have heard of huffman coding which can't deal with fractional bits, arithmetic coding is a generalization of huffman coding that can deal with this. Arithmetic coding is mathematically perfect at what it does. You will not waste a single bit using this algorithm to encode data given a prediction of that data. So the entirety of modern compression techniques don't deal with the encoding/decoding side at all. What they deal with is modelling the data so far and making the most accurate prediction possible on the next bit of data (next byte also works, but working 1 bit at a time is easier to comprehend when learning arithmetic coding). Incidentally the encoders and decoders essentially work exactly the same. Given the data read or data decoded so far predict the next bit. This part is exactly the same either way. The decoder would read the compressed file for the correction and the encoder would read the input file and write out the correction. The important part is "predict the next bit". This is what separates all the different compressors. This is also where those of us experienced in this area try to correct people on the wrong track. A compression algorithm is never about the encoding side but instead 100% always about the prediction of the data. Can you build a model that can accurately predict what the next data to come is? That's what you need to do to make a better file compressor. The entropy encoding part is a completely solved problem already, don't bother re-solving that. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | themafia 4 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||