Remix.run Logo
WithinReason 4 hours ago

> The tokens stored in a KV cache are not arbitrary floating-point data -- they are samples from the exact formal language the model was trained on, and the model is by construction a near-optimal predictor of that language.

You can compress the KV cache to 0 bytes by just recomputing it every token. This observation is not worth an ArXiv paper though.