Remix.run Logo
HappMacDonald 5 hours ago

I'd direct you to the 3 blue 1 brown presentation on this topic, but in a nutshell the semantic space for an embedding can become much richer than the initial token mapping due to previous context.. but only during the course of predicting the next token.

Once that's done, all rich nuance achieved during the last token-prediction step is lost, and then rebuilt from scratch again on the next token-prediction step (oftentimes taking a new direction due to the new token, and often more powerfully any changes at the tail of the context window such as lost tokens, messages, re-arrangement due to summarizing, etc).

So if you say "red ball" somewhere in the context window, then during each prediction step that will expand into a semantic embedding that neither matches "red" nor "ball", but that richer information will not be "remembered" between steps, but rebuilt from scratch every time.