Remix.run Logo
thinkling 8 hours ago

I thought embeddings were the internal representation? Does reasoning and thinking get expanded back out into tokens and fed back in as the next prompt for reasoning? Or does the model internally churn on chains of embeddings?

HappMacDonald 5 hours ago | parent | next [-]

I'd direct you to the 3 blue 1 brown presentation on this topic, but in a nutshell the semantic space for an embedding can become much richer than the initial token mapping due to previous context.. but only during the course of predicting the next token.

Once that's done, all rich nuance achieved during the last token-prediction step is lost, and then rebuilt from scratch again on the next token-prediction step (oftentimes taking a new direction due to the new token, and often more powerfully any changes at the tail of the context window such as lost tokens, messages, re-arrangement due to summarizing, etc).

So if you say "red ball" somewhere in the context window, then during each prediction step that will expand into a semantic embedding that neither matches "red" nor "ball", but that richer information will not be "remembered" between steps, but rebuilt from scratch every time.

hansvm 8 hours ago | parent | prev [-]

There's a certain one-to-oneness between tokens and embeddings. A token expands into a large amount of state, and processing happens on that state and nothing else.

The point is that there isn't any additional state or reasoning. You have a bunch of things equivalent to tokens, and the only trained operations deal with sequences of those things. Calling them "tokens" is a reasonable linguistic choice, since the exact representation of a token isn't core to the argument being made.