| ▲ | Aurornis 5 hours ago | |
Cool visualization, but most of the token generation in my sessions doesn't go to output code or even the text I see. Reasoning tokens make up most of the output. That can only occur after processing the input files and context. For non-trivial work I go through hundreds of thousands of tokens (combined prefill + tg of course) before even getting to some useful text output. I mostly use LLMs for exploration and studies, rarely code generation. Prefill matters heavily for this. Even in the high hundreds or low thousands prefill rate I spend a lot of time waiting on the LLM (doing other things, not twiddling thumbs) | ||