When Claude screws up a task I use Codex and vice versa. It helps a lot when I'm working on libraries that I've never touched before, especially iOS related.

(Also, I can't imagine who is blessed with so much spare tome that they would look down on an assistant that does decent work)

▲

embedding-shape 20 hours ago | parent [-]

> When Claude screws up a task I use Codex and vice versa

Yeah, it feels really strange sometimes. Bumping up against something that Codex seemingly can't work out, and you give it to Claude and suddenly it's easy. And you continue with Claude and eventually it gets stuck on something, and you try Codex which gets it immediately. My guess would be that the training data differs just enough for it to have an impact.

▲

extr 20 hours ago | parent | next [-]

I think Claude is more practically minded. I find that OAI models in general default to the most technically correct, expensive (in terms of LoC implementation cost, possible future maintenance burden, etc) solution. Whereas Claude will take a look at the codebase and say "Looks like a webshit React app, why don't you just do XYZ which gets you 90% of the way there in 3 lines".

But if you want that last 10%, codex is vital.

Edit: Literally after I typed this just had this happen. Codex 5.2 reports a P1 bug in a PR. I look closely, I'm not actually sure it's a "bug". I take it to Claude. Claude agrees it's more of a product behavioral opinion on whether or not to persist garbage data, and offer it's own product opinion that I probably want to keep it the way it is. Codex 5.2 meanwhile stubbornly accepts the view it's a product decision but won't seem to offer it's own opinion!

	▲	deaux 14 hours ago \| parent [-]
		Correct, this has been true for all GPT-5 series. They produce much more "enterprise" code by default, sticking to "best practices", so people who need such code will much prefer them. Claude models tend to adapt more to the existing level of the codebase, defaulting to more lightweight solutions. Gemini 3 hasn't been out long enough yet to gauge, but so far seems somewhere in between.

▲

enraged_camel 19 hours ago | parent | prev [-]

>> My guess would be that the training data differs just enough for it to have an impact.

It's because performance degrades over longer conversations, which decreases the chance that the same conversation will result in a solution, and increases the chance that a new one will. I suspect you would get the same result even if you didn't switch to a different model.

▲

XenophileJKO 19 hours ago | parent [-]

So not really, certainly models degrade by some degree on context retrieval. However, in Cursor you can just change the model used for the exchange, it still has the same long context. You'll see the different model strengths and weaknesses contrasted.

They just have different strengths and weaknesses.

	▲	grimgrin 15 hours ago \| parent [-]
		if claude is stuck on a thing but we’ve made progress (even if that progress is process of elimination) and it’s 120k tokens deep, often when i have claude distill our learnings into a file.. and /clear to start again with said file, i’ll get quicker success which is analogous to taking your problem to another model and ideally feeding it some sorta lesson i guess this is a specific example but one i play out a lot. starting fresh with the same problem is unusual for me. usually has a lesson im feeding it from the start