Remix.run Logo
woadwarrior01 2 hours ago

> I wonder if anyone has figured out how the information is compressed and calculated the amount of information an LLM can hold depending on its size

You might want to look at Physics of Language Models[1]. IIRC, the authors estimate it to be ~2 bits of factual knowledge per parameter.

[1]: https://physics.allen-zhu.com/