| ▲ | sigbottle 10 hours ago | |
Have you ever actually worked with a basic markov problem? The markov property states that your state is a transition of probabilities entirely from the previous state. These states, inhabit a state space. The way you encode "memory" if you need it, e.g. say you need to remember if it rained the last 3 days, is by expanding said state space. In that case, you'd go from 1 state to 3 states, 2^3 states if you needed the precise binary information for each day. Being "clever", maybe you assume only the # of days it rained, in the past 3 days mattered, you can get a 'linear' amount of memory. Sure, a LLM is a "markov chain" of state space size (# tokens)^(context length), at minimum. That's not a helpful abstraction and defeats the original purpose of the markov observation. The entire point of the markov observation is that you can represent a seemingly huge predictive model with just a couple of variables in a discrete state space, and ideally you're the clever programmer/researcher and can significantly collapse said space by being, well, clever. Are you deliberately missing the point or what? | ||
| ▲ | chpatrick 10 hours ago | parent [-] | |
> Sure, a LLM is a "markov chain" of state space size (# tokens)^(context length), at minimum. Okay, so we're agreed. | ||