▲ | frotaur 5 days ago | |
I don't know about golomb coding, but with Arithmetic coding you can do stream decoding(AC), if I remember correctly. I supervised a student's project whose goal was exactly that : implement compression with LLMs using AC. Since AC is optimal, if your LLM has an average cross entropy x on some dataset, you can expect that the compression will compress data using x nats per token on average! | ||
▲ | kybernetikos 4 days ago | parent [-] | |
Arithmetic coding looks like an extremely interesting approach, given that you can use the model at each step to give you the probabilities of each token. |