Remix.run Logo
bobjordan a day ago

I have max plans for both and over the past 5+ months now have built a custom "agent swarm" orchestrator with a database backed API and several skills CLI that the agents use to deliver orchestrated software factory runs.

We can use several different topologies (2 or 3 agents, etc.) but currently primarily use pair programming teams consisting of an opus4.7 for implementation and a codex5.5 for plan and code reviews, with a codex5.5 run-manager that pushes the agent lanes along and keeps things moving if they get stuck or escalate reviews to run-manager decisions.

Escalation to run-manager is a pretty regular thing as Codex5.5 generally picky and thorough and opus4.7 pushes back at times, and after three codex rejections we allow opus4.7 to escalate to run-manager decision to settle it. Usually, opus4.7 agrees and will continue iterating but it's not unusual that it will push back and escalate.

I've found codex5.5 is extremely capable. I just now finished a large multi-phase orchestrated swarm run with codex5.5 (xhigh) as the run-manager, presiding over 8 paired lanes, with 8 opus4.7 (high) implementers and 8 codex5.5 (high) reviewers, so 16 agents orchestrated and working in a swarm together. Codex5.5 managed that run perfectly for 14 hours with zero intervention needed by me.

Overall, I prefer to let opus4.7 draft the plans and then let codex5.5 offer git-diff style change feedback on plans, then let opus implement and codex review/manage. This seems to get the best result for me.

bottlepalm a day ago | parent [-]

Yea, I'm similar I think in that Claude has better style/architecture/design, while Codex is a more critical reviewer, but also writes more complex code that just works not caring as much about the bigger picture - together they work pretty well. I don't run any swarms though, I could easily see them ping/pong on the most simple feature almost endlessly if I let them. How do you review all the code being generated?

bobjordan 21 hours ago | parent [-]

It's a lot to review since adding the AI workflows, but bottom line is I'm not in a race, I've been working on the same repo since 2019 and I generally don't add too much at once and just take my time. But, I'll admit, I'm a lot more careful about backend schema, services, testing, API design, CLI design, etc., while not being too overly worried about frontend items. This particular long run was focused on building frontend UI for backend that has been painstakingly built. This time, I used the claude.ai/design for a large amount of UI planning for a backend that is ready for it. Then I just let the swarm handle it with our orchestration tool, since it's frontend. Then, just test it in the browser and iterate on what needs changed.