Remix.run Logo
Al-Khwarizmi 5 days ago

Many people use this kind of reasoning to justify that LLMs can't be creative, are destined to write bland text, etc. (one notable example was Ted Chiang in the New Yorker) but it has never made any sense.

In my view, the easiest mental model that can be used to roughly explain what LLMs do is a Markov chain. Of course, comparing LLMs to a Markov chain is a gross simplification but it's one that can only make you underestimate them, not vice versa, for obvious reasons.

Well, even a Markov chain can surprise you. While they predict the next word probabilistically, if the dice roll comes out just right, they can choose a low-probability word in the right place and generate original and unexpected text.

Add to this that LLMs are much better at "Markov chaining" that Markov chains themselves, that there is the added instruction tuning (including RLHF) which can be used to bias the model towards more creative/original text that humans like, and that LLMs often pull off things in ways that we don't even really understand - and this kind of claims sound very naive.