Remix.run Logo
alanfranz 3 days ago

I think it’s an old analogy, and a good one. LLMs are for knowledge what mp3s were for audio.

This was widely discussed in the past years as well.

cush 2 days ago | parent | next [-]

It's a terrible analogy because LLMs are already for audio what LLMs are for audio. You can use LLMs to create new songs and sounds. Encyclopedias don't create new songs.

simonw 3 days ago | parent | prev [-]

The older analogy was to JPEG compression - I linked to that in my post (the Ted Chiang link). https://www.newyorker.com/tech/annals-of-technology/chatgpt-...

saberience 3 days ago | parent [-]

This analogy has been used for machine learning since way before ChatGPT, my co workers and I were discussing this idea but for LSTM models in roughly 2018.

What’s old is new again.

simonw 3 days ago | parent [-]

Are you talking about lossy compression or a lossy encyclopedia?

saberience 3 days ago | parent [-]

I work in the AI field and I've heard every analogy possible for LLMs since ChatGPT was released, including many variants of Encylopedias (Grolier/Encarta etc), although, analogies to encyclopedias have always been (for me) quite limited as encyclopedias are just static data-stores and also are riddled with errors and out of date (just like LLMs). LLMs however can provide output which is completely novel and has never been seen before.

Pre LLMs we had already been working on content generation using prior tech, including texture generation pre diffusion models and voice generation (although it sounded terrible). At my company we spent hours discussing the difference between various data compression algorithms and ML techniques/model architectures and what was happening inside ML models and also, inside our brains! But even then we didn't think anything we were discussing was novel at all, these ideas were (and still are) obvious.

Anyway, back on the topic, of the LLM as encyclopedia, you can USE an LLM for encyclopedia-like workloads, and in some cases it is better or worse than an actual encyclopedia. But in the end, encyclopedias are written by flawed humans just like all the data that went into training the LLM was written by flawed humans. Both encyclopedias and LLMs are flawed and in different ways, but LLMs at least can do new things.

I actually think a better analogy to an LLM is to the human brain than an encyclopedia, lossy or not. I think we massively overrate our brains and underrate LLMs. The older I've gotten the more I realize the vast majority of people talk absolute rubbish most of the time, exaggerate their knowledge, spout "truths" which are totally inaccurate, and fake it till they make it throughout most of their life. If you were fact checking the entire population on everything they said on a day to day basis, I think the level of "hallucination" would be much higher than Claude Opus 4.1. That is, I think our level of scrutiny is MUCH higher for LLMs than it is for our friends and co-workers. We tend to assume that if another human says something to us like "New York has a higher level of crime than Buenos Aires", we take them at face level usually, due to various psychological and social priming. But we fact check our LLMs on statements such as these.

simonw 2 days ago | parent [-]

My analogy isn't up an encyclopedia, it's to a "lossy encyclopedia" - I don't think an encyclopedia analogy is valid precisely because LLMs don't have perfect factual recall.