| ▲ | dTal 3 hours ago | |
I am quite certain. The output is "just tokens"; the "position encodings" and "context" are inputs to the LLM function, not outputs. The information that a token can carry is bounded by the entropy of that token. A highly predictable token (given the context) simply can't communicate anything. Again: if a tiny language model or even a basic markov model would also predict the same token, it's a safe bet it doesn't encode any useful thinking when the big model spits it out. | ||
| ▲ | Chance-Device 3 hours ago | parent [-] | |
I just don’t share your certainty. You may or may not be right, but if there isn’t a result showing this, then I’m not going to assume it. | ||