I'm definitely seeing 4.8 use fewer tokens than 4.7 did to accomplish the same outcomes. And the key is, I think it takes less clock time for me to get the outcomes I want with 4.8 - it's more likely to get things right the first time.

I think this is us humans mis-attributing "it's getting more done" as "it costs more". I think the rate of getting more done scaled quite a bit faster than the token burn.

▲

voxl 8 hours ago | parent [-]

My eyes rolled so far back I'm blind

	▲	pvankessel 7 hours ago \| parent [-]
		I'm with you. I haven't materially been more satisfied with the code or reasoning with 4.8 than I was with 4.7. But I'm also not vibe coding, I'm reviewing all of the output. Maybe 4.8 has been making fewer mistakes that I otherwise would have corrected on, but I was perfectly happy going through a few iterations with 4.7 to get it over the finish line. This trend just has me startled and I'm now realizing that my workflow will need to shift to open-weight models very soon. They're cranking the costs and there's no way I can get my employer to cover what's apparently become $2k a day in token use.