Remix.run Logo
hansvm 8 hours ago

There's a certain one-to-oneness between tokens and embeddings. A token expands into a large amount of state, and processing happens on that state and nothing else.

The point is that there isn't any additional state or reasoning. You have a bunch of things equivalent to tokens, and the only trained operations deal with sequences of those things. Calling them "tokens" is a reasonable linguistic choice, since the exact representation of a token isn't core to the argument being made.