| ▲ | jerf 8 hours ago | |||||||||||||||||||||||||
There is a study that shows that what the model is doing behind the scenes in those cases is a lot more than just outputting those tokens. For an LLM, tokens are thought. They have no ability to think, by whatever definition of that word you like, without outputting something. The token only represents a tiny fraction of the internal state changes made when a token is output. Clearly there is an optimal for each task (not necessarily a global one) and a concrete model for a given task can be arbitrarily far from it. But you'd need to test it out for each case, not just assume that "less tokens = more better". You can be forcing your model to be dumber without realizing it if you're not testing. | ||||||||||||||||||||||||||
| ▲ | DonHopkins 7 hours ago | parent | next [-] | |||||||||||||||||||||||||
High dimensional vectors are thought (insofar as you can define what that even means). Tokens are one dimensional input that navigates the thought, and output that renders the thought. The "thinking" takes place in the high dimension space, not the one dimensional stream of tokens. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | xgulfie 6 hours ago | parent | prev [-] | |||||||||||||||||||||||||
[flagged] | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||