Remix clone Hacker News

new | show | ask | jobs Github

	▲	WithinReason 4 hours ago
		> The tokens stored in a KV cache are not arbitrary floating-point data -- they are samples from the exact formal language the model was trained on, and the model is by construction a near-optimal predictor of that language. You can compress the KV cache to 0 bytes by just recomputing it every token. This observation is not worth an ArXiv paper though.