Remix.run Logo
taurath 5 hours ago

These 2 sentences right next to each other stood out to me:

> a new step towards becoming a reliable coding partner

> GPT‑5.1-Codex-Max is built for long-running, detailed work

Does this not sound contradictory? It’s been the shorter form work that has built what little confidence I have in these as a coding partner - a model that goes off and does work without supervision is not a partner to me.

causal 5 hours ago | parent | next [-]

Absolutely contradictory. The long-running tendency for Codex is why I cannot understand the hype around it: if you bother to watch what it does and read its code the approaches it takes are absolutely horrifying. It would rather rewrite a TLS library from scratch than bother to ask you if the network is available.

meowface 2 hours ago | parent | next [-]

>It would rather rewrite a TLS library from scratch than bother to ask you if the network is available.

This is definitely one of the biggest issues with coding agents at the moment.

That said, from my experience, Codex so often does things that are so useful and save me so much time that the occasional "oh god what the hell did it just go off and do" are an acceptable cost for me.

I regularly get great results with open-ended prompts and agents that spend 15+ minutes working on the task. I'm sure they'll eventually get better at common sense understanding of what kind of work is wasteful/absurd.

keeganpoppen 5 hours ago | parent | prev [-]

these things are actually fixable with prompting. is it easy? no. is it PEBKaC if you don’t do anything to change course as it builds a TLS library? yes, but paperclip maximized! xD

causal 4 hours ago | parent [-]

Or you can have a model with some semblance of common sense that will stop and say "Hey I can I have access to the network to do X?"

Codex feels like a tool designed to run after all the humans are gone.

embirico 5 hours ago | parent | prev | next [-]

(Disclaimer: Am on the Codex team.) We're basically trying to build a teammate that can do both short, iterative work with you, then as you build trust (and configuration), you can delegate longer tasks to it.

The "# of model-generated tokens per response" chart in [the blog introducing gpt-5-codex](https://openai.com/index/introducing-upgrades-to-codex/) shows an example of how we're improving the model good at both.

ntonozzi 5 hours ago | parent | prev [-]

If you haven't, give Cursor's Composer model a shot. It might not be quite as good as the top models, but in my experience it's almost as good, and the lightning fast feedback is more than worth the tradeoff. You can give it a task, wait ten seconds, and evaluate the results. It's quite common for it to not be good enough, but no worse than Sonnet, and if it doesn't work you just wasted 30 seconds instead of 10 minutes.