Remix.run Logo
crishoj 8 hours ago

Any idea what "output token efficiency" refers to? Gemini Flash is billed by number of input/output tokens, which I assume is fixed for the same output, so I'm struggling to understand how it could result in lower cost. Unless of course they have changed tokenization in the new version?

Romario77 6 hours ago | parent | next [-]

They provide the answer in less words (while still conveying what needed to be said).

Which is a good thing in my book as the models now are way too verbose (and I suspect one of the reasons is the billing by tokens).

minimaxir 8 hours ago | parent | prev | next [-]

The post implies that the new model are better at thinking, therefore less time/cost spent overall.

The first chart implies the gains are minimal for nonthinking models.

kaspermarstal 6 hours ago | parent | prev [-]

Models are less verbose, so produces fewer output tokens, so answers cost less.