Remix.run Logo
philipbjorge 2 days ago

So happy to have diversified my model providers this past couple of weeks. GPT-5.5 has had no trouble slotting into Opus workloads. Will be fun to try out more of the models as time goes on to build some resiliency into my engineering workflows :).

fnordpiglet a day ago | parent | next [-]

I think if codex can fill in some functional gaps that shouldn’t be that huge - like having defined agents in plugins like Claude code - it’s actually a preferable product. It’s faster in every way, seems to manage context a lot better - compaction isn’t a completely end of world event to be avoided at all costs. With the addition of defined thinking and the fact it actually seems to follow tool calling instructions, it’s handler for permissions, and other features it’s frankly a better tool overall. 5.5 seems to be a reasonable model.

Anthropic seems to have really killed their advantage by squandering the immense good will they built up by blundering over and over again the last few months with the developer community.

Tonight, for instance, after the incident had recovered, I restarted my work. On my Max account my usage period completely exhausted in 4 minutes of sonnet subagent work. This was long after prime time, and the workload was a fraction what I normally do.

These days I run codex concurrently and have gotten my marketplaces and plugins and MCPs adapted to it - other than the agents which I do lean heavily on - and generally find it a capable replacement. Anthropic needs to take notice and get their house in order.

fooster 2 days ago | parent | prev [-]

I found GPT 5.4 terrible. I just tested 5.5 and compared with opus its still not great.

philipbjorge 2 days ago | parent | next [-]

What I found was that I *strongly* preferred Claude Code with its defaults. Codex was almost unusable to me -- It would spit out a 4-5 page plan where it kept repeating itself, where Claude would give me a crisp 1-2 pager I could actually review.

*But* I don't work with the defaults -- I work with my own prompt framework based off of superpowers.

Given sufficient prompt scaffolding, I've found the models relatively interchangeable -- _I might_ be getting some of this for free by basing my own system off of superpowers which is used across various harnesses -- In other words achieving this kind of portability may be a lot harder than it looks and I'm benefiting from other people's work.

fooster a day ago | parent | next [-]

The problem I ran into was, using the workflow I use with claude, the code that being written wasn't good, missing edged cases, incomplete.

After reviewing the code, I also found it was annoying to get GPT 5.4 to actually fix the code based on my prompts compared with opus. I had to be far more specific and direct (which is related then to missing edge cases, complete, etc).

threatripper a day ago | parent | prev [-]

I lack a bit of context. Can you point me to a place that explains what you use?

philipbjorge a day ago | parent [-]

I haven't really shared what I use, I'm still deciding if that's something I want to do.

To get an idea of what I'm talking about, you could install https://github.com/obra/superpowers/ into both Codex and Claude Code -- You'll find that the behavior is remarkably similar if you A/B compare them on the same problems. CC occasionally misses things that Codex gets and vice versa.

Overall the output structure and final code is remarkably similar... Which is pretty different than if you just run them with their default system prompts. I'd throw codex out the window with its default outputs.

wahnfrieden 2 days ago | parent | prev [-]

In what harness?

fooster a day ago | parent [-]

codex. codex is also pretty garbage compared with claude code. The permissioning system in claude code with auto mode is now pretty fantastic. With codex the only vaugely usable mode is yolo mode which is bad for obvious reasons.