Remix clone Hacker News

new | show | ask | jobs Github

	▲	spott 4 days ago
		KV cache for dense models is order 50% of parameters. For sparse moe models it can be significantly smaller I believe, but I don’t think it is measured in kb.