▲ | crishoj 8 hours ago | |
Any idea what "output token efficiency" refers to? Gemini Flash is billed by number of input/output tokens, which I assume is fixed for the same output, so I'm struggling to understand how it could result in lower cost. Unless of course they have changed tokenization in the new version? | ||
▲ | Romario77 6 hours ago | parent | next [-] | |
They provide the answer in less words (while still conveying what needed to be said). Which is a good thing in my book as the models now are way too verbose (and I suspect one of the reasons is the billing by tokens). | ||
▲ | minimaxir 8 hours ago | parent | prev | next [-] | |
The post implies that the new model are better at thinking, therefore less time/cost spent overall. The first chart implies the gains are minimal for nonthinking models. | ||
▲ | kaspermarstal 6 hours ago | parent | prev [-] | |
Models are less verbose, so produces fewer output tokens, so answers cost less. |