Remix clone Hacker News

new | show | ask | jobs Github

	▲	crishoj 8 hours ago
		Any idea what "output token efficiency" refers to? Gemini Flash is billed by number of input/output tokens, which I assume is fixed for the same output, so I'm struggling to understand how it could result in lower cost. Unless of course they have changed tokenization in the new version?
	▲	Romario77 6 hours ago \| parent \| next [-]
		They provide the answer in less words (while still conveying what needed to be said). Which is a good thing in my book as the models now are way too verbose (and I suspect one of the reasons is the billing by tokens).
	▲	minimaxir 8 hours ago \| parent \| prev \| next [-]
		The post implies that the new model are better at thinking, therefore less time/cost spent overall. The first chart implies the gains are minimal for nonthinking models.
	▲	kaspermarstal 6 hours ago \| parent \| prev [-]
		Models are less verbose, so produces fewer output tokens, so answers cost less.