| ▲ | Dispersion loss counteracts embedding condensation in small language models(chenliu-1996.github.io) | |||||||
| 15 points by E-Reverance 2 hours ago | 3 comments | ||||||||
| ▲ | aetherspawn 39 minutes ago | parent | next [-] | |||||||
It makes sense to me that distributing across more parameters results in models that can be quant more heavily (information theory - more bits available) I wonder if anyone has figured out how the information is compressed and calculated the amount of information an LLM can hold depending on its size | ||||||||
| ▲ | lwansbrough 41 minutes ago | parent | prev [-] | |||||||
Anyone with a billion dollars want to try this and report back? | ||||||||
| ||||||||