You have more info about the inflated token use? I’m using codex cli a bunch now, but the reported token usage seems like an order of magnitude higher than, say Claude code with opus.

Idk if it’s because I set codex to xhigh reasoning, but even then it still seems way higher than Claude. The input/output ratio feels large too, eg I have codex session which says ~500M in / ~2M out.

▲

moralestapia 10 hours ago | parent [-]

I wish I had hard evidence but it is mostly an observation. I do use Codex a lot and I felt a drastic change from like one-two months ago to this day.

It used to give me precise answers, "surgical" is how I described it to my friends. Now it generates a lot of slop and plenty of "follow ups". It doesn't give me wrong answers, which is ok, but I've found that things that used to take 3-4 prompts now take 8-10. Obviously my prompting skills haven't changed much and, if anything, they've become better.

This is something that other colleagues have observed as well. Even the same GPT5.4 model feels different and more chatty recently. Btw, I think their version numbers mean nothing, no one can be certain about the model that is actually running on the backend and it is pretty evident that they're continuously "improving" it.

	▲	SpicyLemonZest 7 hours ago \| parent [-]
		I haven't had the time to fully hash this take out, but a big question in the back of my mind has been - is it possible that AI model improvements come partly from finding overhang in things that look hard and impressive to humans but are actually trivial consequences of the training data? If true, then the observable performance of any widely distributed model could get worse over time as it "mines out" the work that's easy for it to do.