▲ | hansvm 8 hours ago | |
There's a certain one-to-oneness between tokens and embeddings. A token expands into a large amount of state, and processing happens on that state and nothing else. The point is that there isn't any additional state or reasoning. You have a bunch of things equivalent to tokens, and the only trained operations deal with sequences of those things. Calling them "tokens" is a reasonable linguistic choice, since the exact representation of a token isn't core to the argument being made. |