Remix.run Logo
ActorNightly 2 days ago

In theory, you could have a large enough markov chain that mimicks an LLM, you would just need it to be exponentially larger in width.

After all, its just matrix multplies start to finish.

A lot of the other data operation (like normalization) can be represented as matrix multiplies, just less efficiently. In the same way that a transformer can be represented inefficiency as a set of fully connected deep layers.

kleiba a day ago | parent [-]

True. But the considerations re: practicability are not to be ignored.