As long as you give it deterministic goals / test criteria (compiles, lints, tests, E2E tests, achieve 100% parity with existing solution etc) it will brute force its way to a solution. Codex will work for hours/days, even weeks sometimes, until it has finished. A person would never work this way, but since this just runs in the background, there’s no issue with this approach except if you need it fast.

▲

xyzzy_plugh 2 days ago | parent [-]

No, it might figure out the solution but even after many days there's no assurance that it won't get stuck making the same mistakes over and over again, never getting closer to a solution. I've seen this many times.

▲

manmal 2 days ago | parent | next [-]

Getting in a loop does still happen, yes. If you run codex in tmux and let another agent just occasionally check on progress, it can be prevented. That’s not even expensive - checking every 30 minutes suffices. The watchdog agent can then press Esc in tmux and send a message, maybe do some research to get it unstuck etc

▲

minimaxir 2 days ago | parent | prev [-]

Definitely have not seen that with Opus 4.5.

	▲	manmal 2 days ago \| parent [-]
		Neither have I, personally, but I’ve seen reports this can happen on very hard problems, where the goal just cannot be reached from a local optimum. Getting unstuck by trying something new is something a watchdog agent could prompt it.