Remix clone Hacker News

new | show | ask | jobs Github

	▲	aetherspawn 2 hours ago
		It makes sense to me that distributing across more parameters results in models that can be quant more heavily (information theory - more bits available) I wonder if anyone has figured out how the information is compressed and calculated the amount of information an LLM can hold depending on its size
	▲	woadwarrior01 an hour ago \| parent [-]
		> I wonder if anyone has figured out how the information is compressed and calculated the amount of information an LLM can hold depending on its size You might want to look at Physics of Language Models[1]. IIRC, the authors estimate it to be ~2 bits of factual knowledge per parameter. [1]: https://physics.allen-zhu.com/