| ▲ | gpm 5 hours ago | |||||||
It strikes me that more tokens likely give the LLM more time/space to "think". Also that more redundant tokens, like local type declarations instead of type inference from far away, likely often reduce the portion of the code LLMs (and humans) have to read. So I'm not convinced this is either the right metric, or even if you got the right metric that it's a metric you want to minimize. | ||||||||
| ▲ | limoce 5 hours ago | parent | next [-] | |||||||
I think separating thinking tokens from "representing" tokens might be a better approach, like what those thinking models does | ||||||||
| ▲ | make3 3 hours ago | parent | prev [-] | |||||||
With Chain of Thoughts (text thinking), the models can already use as much compute as they want in any language (determined by reinforcement learning training) | ||||||||
| ||||||||