Remix.run Logo
sqeaky 8 hours ago

Think about all the times in llm gets it wrong, the fact that would have helped to get it right is something that was lost. I suppose this isn't proof it's lossy just maybe we don't know how to get the data out.

Or look at it another way LLMs or just text prediction machines, whatever information doesn't help them predict the next token or conflicts with the likelihood of the next token is something that gets dropped.

Or look at it another way these things are often trained on the many terabytes of the internet yet even a 200 billion parameter network is 100 or 200 GB in size. So something is missing, and that is a way better compression ratio then the best known algorithms for lossless compression.

Or we can look at it another way, these things were never built to be lossless compression systems. We can know by looking at how these things are implemented that they don't retain everything they're trained on, they extract a bunch of statistics.

visarga 7 hours ago | parent | next [-]

I think extraction from the model itself is a bad idea. But extraction from external sources, such as the deep research reports LLMs generate, or solving problems where we have validation of correctness is a good idea. The model is not validating its outputs by simply doing another inference, but consults external sources or gets feedback from code execution. Humans in chat rooms could also provide lots of learning signal, especially when actions are judged against the outcomes they cause down the line, using hindsight.

So in short what works is a model + a way to know its good outputs from bad ones.

6 hours ago | parent | prev [-]
[deleted]