It has become very quickly unfashionable for people to say they like the Codex CLI. I still enjoy working with it and my only complaint is that its speed makes it unideal for pair coding.

On top of that, the Codex CLI team is responsive on github and it's clear that user complaints make their way to the team responsible for fine tuning these models.

I run bake offs on between all three models and GPT 5.2 generally has a higher success rate of implementing features, followed closely by Opus 4.5 and then Gemini 3, which has troubles with agentic coding. I'm interested to see how 5.2-codex behaves. I haven't been a fan of the codex models in general.

▲

jbm 21 hours ago | parent | next [-]

When Claude screws up a task I use Codex and vice versa. It helps a lot when I'm working on libraries that I've never touched before, especially iOS related.

(Also, I can't imagine who is blessed with so much spare tome that they would look down on an assistant that does decent work)

▲

embedding-shape 20 hours ago | parent [-]

> When Claude screws up a task I use Codex and vice versa

Yeah, it feels really strange sometimes. Bumping up against something that Codex seemingly can't work out, and you give it to Claude and suddenly it's easy. And you continue with Claude and eventually it gets stuck on something, and you try Codex which gets it immediately. My guess would be that the training data differs just enough for it to have an impact.

▲

extr 20 hours ago | parent | next [-]

I think Claude is more practically minded. I find that OAI models in general default to the most technically correct, expensive (in terms of LoC implementation cost, possible future maintenance burden, etc) solution. Whereas Claude will take a look at the codebase and say "Looks like a webshit React app, why don't you just do XYZ which gets you 90% of the way there in 3 lines".

But if you want that last 10%, codex is vital.

Edit: Literally after I typed this just had this happen. Codex 5.2 reports a P1 bug in a PR. I look closely, I'm not actually sure it's a "bug". I take it to Claude. Claude agrees it's more of a product behavioral opinion on whether or not to persist garbage data, and offer it's own product opinion that I probably want to keep it the way it is. Codex 5.2 meanwhile stubbornly accepts the view it's a product decision but won't seem to offer it's own opinion!

	▲	deaux 13 hours ago \| parent [-]
		Correct, this has been true for all GPT-5 series. They produce much more "enterprise" code by default, sticking to "best practices", so people who need such code will much prefer them. Claude models tend to adapt more to the existing level of the codebase, defaulting to more lightweight solutions. Gemini 3 hasn't been out long enough yet to gauge, but so far seems somewhere in between.

▲

enraged_camel 19 hours ago | parent | prev [-]

>> My guess would be that the training data differs just enough for it to have an impact.

It's because performance degrades over longer conversations, which decreases the chance that the same conversation will result in a solution, and increases the chance that a new one will. I suspect you would get the same result even if you didn't switch to a different model.

▲

XenophileJKO 19 hours ago | parent [-]

So not really, certainly models degrade by some degree on context retrieval. However, in Cursor you can just change the model used for the exchange, it still has the same long context. You'll see the different model strengths and weaknesses contrasted.

They just have different strengths and weaknesses.

	▲	grimgrin 15 hours ago \| parent [-]
		if claude is stuck on a thing but we’ve made progress (even if that progress is process of elimination) and it’s 120k tokens deep, often when i have claude distill our learnings into a file.. and /clear to start again with said file, i’ll get quicker success which is analogous to taking your problem to another model and ideally feeding it some sorta lesson i guess this is a specific example but one i play out a lot. starting fresh with the same problem is unusual for me. usually has a lesson im feeding it from the start

▲

qsort 21 hours ago | parent | prev | next [-]

I care very little about fashion, whether in clothes or in computers. I've always liked Anthropic products a bit more but Codex is excellent, if that's your jam more power to you.

	▲	21 hours ago \| parent [-]
		[deleted]

▲

EnPissant 18 hours ago | parent | prev | next [-]

Claude Code is just a better CLI:

- Planning mode. Codex is extremely frustrating. You have to constantly tell it not to edit when you talk to it, and even then it will sometimes just start working.

- Better terminal rendering (Codex seems to go for a "clean" look at the cost of clearly distinguished output)

- It prompts you for questions using menus

- Sub-agents don't pollute your context

▲

dingnuts 21 hours ago | parent | prev [-]

the faddish nature of these tools fits the narrative of the METR findings that the tools slow you down while making you feel faster.

since nobody (other than that paper) has been trying to measure output, everything is based on feelings and fashion, like you say.

I'm still raw dogging my code. I'll start using these tools when someone can measure the increase in output. Leadership at work is beginning to claim they can, so maybe the writing is on the wall for me. They haven't shown their methodology for what they are measuring, just telling everyone they "can tell"

But until then, I can spot too many psychological biases inherent in their use to trust my own judgement, especially when the only real study done so far on this subject shows that our intuition lies about this.

And in the meantime, I've already lost time investigating reasonable looking open source projects that turned out to be 1) vibe coded and 2) fully non functional even in the most trivial use. I'm so sick of it. I need a new career