but that's why I like Codex CLI, it's so bare bone and lightweight that I can build lots tools on top of it. persistent thinking tokens? let me have that using a separate file the AI writes to. the reasoning tokens we see aren't the actual tokens anyway; the model does a lot more behind the scenes but the API keeps them hidden (all providers do that).

▲

postalcoder 7 hours ago | parent [-]

Codex is wicked efficient with context windows, with the tradeoff of time spent. It hurts the flow state, but overall I've found that it's the best at having long conversations/coding sessions.

▲

behnamoh 7 hours ago | parent [-]

yeah it throws me out of the "flow", which I don't like. maybe the cerebras deal helps with that.

▲

postalcoder 7 hours ago | parent [-]

It's worth it at the end of the day because it tends to properly scope out changes and generate complete edits, whereas I always have to bring Opus around to fix things it didn't fix or manually loop in some piece of context that it didn't find before.

That said, faster inference can't come soon enough.

	▲	behnamoh 7 hours ago \| parent [-]
		> That said, faster inference can't come soon enough. why is that? technical limits? I know cerebras struggles with compute and they stopped their coding plan (sold out!). their arch also hasn't been used with large models like gpt-5.2. the largest they support (if not quantized) is glm 4.7 which is <500B params.