▲ | EnPissant a day ago | |||||||||||||||||||||||||||||||||||||
My experience with Codex / Gpt-5: - The smartest model I have used. Solves problems better than Opus-4.1. - It can be lazy. With Claude Code / Opus, once given a problem, it will generally work until completion. Codex will often perform only the first few steps and then ask if I want to continue to do the rest. It does this even if I tell it to not stop until completion. - I have seen severe degradation near max context. For example, I have seen it just repeat the next steps every time I tell it to continue and I have to manually compact. I'm not sure if the problems are Gpt-5 or Codex. I suspect a better Codex could resolve them. | ||||||||||||||||||||||||||||||||||||||
▲ | brookst a day ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
Claude seems to have gotten worse for me, with both that kind of laziness and a new pattern where it will write the test, write the code, run the test, and then declare that the test is working perfectly but there are problems in the (new) code that need to be fixed. Very frustrating, and happening more often. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
▲ | M4v3R a day ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Context degradation is a real problem with all frontier LLMs. As a rule of thumb I try to never exceed 50% of available context window when working with either Claude Sonnet 4 or GPT-5 since the quality drops really fast from there. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
▲ | apigalore 16 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Yes, this is the one thing stopping me from going to Codex completely. Currently, it's kind of annoying that Codex stops often and asks me what to do, and I just reply "continue". Even though I already gave it a checklist. With GPT‑5-Codex they do write: "During testing, we've seen GPT‑5-Codex work independently for more than 7 hours at a time on large, complex tasks, iterating on its implementation, fixing test failures, and ultimately delivering a successful implementation." https://openai.com/index/introducing-upgrades-to-codex/ | ||||||||||||||||||||||||||||||||||||||
▲ | bayesianbot a day ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
I definitely agree with all of those points. I just really prefer it completing steps and asking me if we should continue to next step rather than doing half of the step and telling me it's done. And the context degradation seems quite random - sometimes it hits way earlier, sometimes we go through crazy amount of tokens and it all works out. | ||||||||||||||||||||||||||||||||||||||
▲ | tanvach a day ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
I also noticed the laziness compared to Sonnet models but now I feel it’s a good feature. Sonnet models, now I realize, are way too eager to hammer out code with way more likelihood of bugs. |