The problem with that argument is it is trivial to write a Markov chain program that takes in text and then can generate the most probable series of words given a starting word. I myself wrote such a program in BASIC on a 64K 8-bit computer in the 1980s after reading one of A.K. Dewdney's columns. That wasn't at all an LLM though. There's a connection, sure, but one that is equating a paper airplane to a jet airliner.

▲

Charon77 7 hours ago | parent [-]

The issue with Markov Chain is you can't get good next token prediction on long enough context because once you see the last 1000 words instead of just 2, it's quite unlikely that your 'frequency' is populated for that exact combination, and markov chain don't work on token embedding that allows some encoding of meaning.

	▲	AlecSchueler an hour ago \| parent [-]
		> and markov chain don't work on token embedding that allows some encoding of meaning. Working on an "encoding of meaning" sure sounds a lot like reasoning.