▲ | jumploops 21 hours ago | |||||||
Interesting, the new model uses a different prompt in Codex CLI that's ~half the size (10KB vs. 23KB) of the previous prompt[0][1]. SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors (via internal refactor benchmark 33.9% -> 51.3%). As someone who recently used Codex CLI (`gpt-5-high`) to do a relatively large refactor (multiple internal libs to dedicated packages), I kept running into bugs introduced when the model would delete a file and then rewrite it (missing crucial or important details). My approach would have been to just the copy the file over and then make package-specific changes, so maybe better tool calling is at play here. Additionally, they claim the new model is more steerable (both with AGENTS.md and generally). In my experience, Codex CLI w/gpt-5 is already a lot more steerable than Claude Code, but any improvements are welcome! [0]https://github.com/openai/codex/blob/main/codex-rs/core/gpt_... [1]https://github.com/openai/codex/blob/main/codex-rs/core/prom... (comment reposted from other thread) | ||||||||
▲ | faangguyindia 13 hours ago | parent | next [-] | |||||||
I do not trust SWE bench, here i am using gemini 2.5 pro and single shot most features: https://www.reddit.com/r/ChatGPTCoding/comments/1nh7bu1/3_ph... | ||||||||
▲ | robotswantdata 21 hours ago | parent | prev [-] | |||||||
saw the same behaviour What worked was getting it to first write a detailed implementation plan for a “junior contractor” then attempt it in phases (clearing task window each time) and told to use /tmp to copy files and transform them then update the original. Looking forward to trying the new model out on the next refactor! | ||||||||
|