Remix.run Logo
yobbo 2 days ago

A hidden markov model (HMM) is theoretically capable of modelling text just as well any transformer. Typically, HMMs are probability distributions over a hidden discrete state space, but the distribution and state space can be anything. The size of the state space and transition function determines its capacity. RNNs are effectively HMMs, and recent ones like "Mamba" and so on are considered competent.

Transformers can be interpreted as tricks that recreate the state as a function of the context window.

I don't recall reading about attempts to train very large discrete (million states) HMMs on modern text tokens.