▲ | kybernetikos 5 days ago | |||||||||||||||||||||||||
For fun over the last few days, I've built a compressor / decompressor that uses the logits from an LLM, for each token in the input, then takes the ranks and exponential goolomb encodes them. Then you work in reverse to regenerate the original It took me ages to get the prediction for the second token after "hello" to match the same as the prediction for the second token when running the model on the string "hello world", despite the fact that I was using a causal model. I tried all kinds of things before discovering that `quantized: false` was the important setting. | ||||||||||||||||||||||||||
▲ | giveita 5 days ago | parent [-] | |||||||||||||||||||||||||
What's the Weissman score? Or more seriously :) did it perform well. Sounds like it should. If more and more text is AI slop it should do well. I dont fully understand what you said but I guess higher probability logits are encoded with fewer bits. If your text is the LLM output then you may need a bit or two per token? | ||||||||||||||||||||||||||
|