Remix.run Logo
hackinthebochs 2 hours ago

They are deterministic in the sense that the inference process scores every word in the vocabulary in a deterministic manner. This score map is then sampled from according to the temperature setting. Non-determinism is artificially injected for ergonomic reasons.

>But I think there’s still the question if this process is more similar to thought or a Markov chain.

It's definitely far from a Markov chain. Markov chains treat the past context as a single unit, an N-tuple that has no internal structure. The next state is indexed by this tuple. LLMs leverage the internal structure of the context which allows a large class of generalization that Markov chains necessarily miss.