Remix.run Logo
cmrdporcupine 8 hours ago

Do this -- take your coworker's PRs that they've clearly written in Claude Code, and have Codex/GPT 5.4 review them.

Or have Codex review your own Claude Code work.

It then becomes clear just how "sloppy" CC is.

I wouldn't mind having Opus around in my back pocket to yeet out whole net new greenfield features. But I can't trust it to produce well-engineered things to my standards. Not that anybody should trust an LLM to that level, but there's matters of degree here.

kevinsync 7 hours ago | parent | next [-]

I've been using Claude and Codex in tandem ($100 CC, $20 Codex), and have made heavy use of claude-co-commands [0] to make them talk. Outside of the last 1-2 weeks (which we now have confirmation YET AGAIN that Claude shits the fucking bed in the run-up to a new model release), I usually will put Claude on max + /plan to gin up a fever dream to implement. When the plan is presented, I tell it to /co-validate with Codex, which tends to fill in many implementation gaps. Claude then codes the amended plan and commits, then I have a Codex skill that reviews the commit for gaps, missed edge cases, incorrect implementation, missed optimizations, etc, and fix them. This had been working quite well up until the beginning of the month, Claude more or less got CTE, and after a week of that I swapped to $100 Codex, $20 CC plans. Now I'm using co-validation a lot less and just driving primarily via Codex. When Claude works, it provides some good collaborative insights and counter-points, but Codex at the very least is consistently predictable (for text-oriented, data-oriented stuff -- I don't use either for designing or implementing frontend / UI / etc).

As always, YMMV!

[0] https://github.com/SnakeO/claude-co-commands

hulk-konen 3 hours ago | parent | next [-]

Some variation of this is the way.

You should not get dependent on one black box. Companies will exploit that dependency.

My version of this is having CC Pro, Cursor Pro, and OpenCode (with $10 to Codex/GLM 5.1) --> total $50. My work doesn't stop if one of these is having overloaded servers, etc. And it's definitely useful to have them cross-checking each other's plans and work.

cmrdporcupine 7 hours ago | parent | prev [-]

This more or less mimics a flow that I had fairly good results from -- but I'm unwilling to pay for both right now unless I had a client or employer willing to foot the bill.

Claude Code as "author" and a $20 Codex as reviewer/planner/tester has worked for me to squeeze better value out of the CC plan. But with the new $100 codex plan, and with the way Anthropic seemed to nerf their own $100 plan, I'm not doing this anymore.

afavour 8 hours ago | parent | prev | next [-]

> It then becomes clear just how "sloppy" CC is.

Have you done the reverse? In my experience models will always find something to criticize in another model's work.

cmrdporcupine 8 hours ago | parent [-]

I have, and in fact models will find things to criticize in their own work, too, so it's good to iterate.

But I've had the best results with GPT 5.4

woadwarrior01 8 hours ago | parent | prev [-]

It cuts both ways. What I usually do these days is to let codex write code, then use claude code /simplify, have both codex and claude code review the PR, then finally manually review and fixup things myself. It's still ~2x faster than doing everything by myself.

cmrdporcupine 7 hours ago | parent [-]

I often work this way too, but I'll say this:

This flow is exhausting. A day of working this way leaves me much more drained than traditional old school coding.

woadwarrior01 7 hours ago | parent [-]

100%. On days when I'm sleep deprived (once or twice a week), I fallback to this flow. On regular days, I tend to write more code the old school way and use things things for review.