Remix.run Logo
CjHuber 21 hours ago

Somehow Codex for me is always way worse than the base models.

Especially in the CLI, it seems that its so way too eager to start writing code nothing can stop it, not even the best Agents.md.

Asking it a question or telling it to check something doesn‘t mean it should start editing code, it means answer the question. All models have this issue to some degree, but codex is the worst offender for me.

w-m 20 hours ago | parent | next [-]

Just use the non-codex models for investigation and planning, they listen to "do not edit any files yet, just reply here in chat". And they're better at getting the bigger picture. Then you can use the -codex variant for execution of a carefully drafted plan.

JeremyNT 20 hours ago | parent | prev | next [-]

Same experience here.

I see people gushing over these codex models but they seem worse than the big gpt models in my own actual use (i.e. I'll give the same prompt to gpt-5.1 and gpt-5.1-codex and codex will give me functional but weird/ugly code, whereas gpt-5.1 code is cleaner)

embedding-shape 20 hours ago | parent | prev | next [-]

> Somehow Codex for me is always way worse than the base models.

I feel the same. CodexTheModel (why have two things named the same way?!) is a good deal faster than the other models, and probably on the "fast/accuracy" scale it sits somewhere else, but most code I want to be as high quality as possible, and the base models do seem better at that than CodexTheModel.

6thbit 19 hours ago | parent | prev | next [-]

Agreed. They are working on a plan mode that should hopefully alleviate this.

What has somewhat worked for me atm is to ask to only update an .md plan file and act on the file only, seems to appease its eagerness to write files.

flir 16 hours ago | parent | prev | next [-]

"Don't write any code yet, we're just having a discussion" - works for me, ymmv etc.

nowittyusername 19 hours ago | parent | prev [-]

I've had this issues as well since codex models were introduced. i tried them but 5.1 regular on high thinking always worked better for me. I think its because its thinking is deeper and more nuanced it seemed to understand better what needed doing. I did have to interact more often with it versus Codex which just worked for a long time by itself, but those interactions were worth it in reduction of assumptions and other stuff Codex made. Im gonna try 5,2 Codex today and hope that changes, but so far I've been happy with base 5.1 high thinking.