Remix.run Logo
Dispersion loss counteracts embedding condensation in small language models(chenliu-1996.github.io)
15 points by E-Reverance 2 hours ago | 3 comments
aetherspawn 39 minutes ago | parent | next [-]

It makes sense to me that distributing across more parameters results in models that can be quant more heavily (information theory - more bits available)

I wonder if anyone has figured out how the information is compressed and calculated the amount of information an LLM can hold depending on its size

lwansbrough 41 minutes ago | parent | prev [-]

Anyone with a billion dollars want to try this and report back?

nullc 26 minutes ago | parent [-]

From the paper it appears that it's probably more useful on small-ish models.