| ▲ | ActorNightly 2 days ago | |
In theory, you could have a large enough markov chain that mimicks an LLM, you would just need it to be exponentially larger in width. After all, its just matrix multplies start to finish. A lot of the other data operation (like normalization) can be represented as matrix multiplies, just less efficiently. In the same way that a transformer can be represented inefficiency as a set of fully connected deep layers. | ||
| ▲ | kleiba a day ago | parent [-] | |
True. But the considerations re: practicability are not to be ignored. | ||