| ▲ | manmal 2 days ago |
| As long as you give it deterministic goals / test criteria (compiles, lints, tests, E2E tests, achieve 100% parity with existing solution etc) it will brute force its way to a solution. Codex will work for hours/days, even weeks sometimes, until it has finished. A person would never work this way, but since this just runs in the background, there’s no issue with this approach except if you need it fast. |
|
| ▲ | xyzzy_plugh 2 days ago | parent [-] |
| No, it might figure out the solution but even after many days there's no assurance that it won't get stuck making the same mistakes over and over again, never getting closer to a solution. I've seen this many times. |
| |
| ▲ | manmal 2 days ago | parent | next [-] | | Getting in a loop does still happen, yes. If you run codex in tmux and let another agent just occasionally check on progress, it can be prevented. That’s not even expensive - checking every 30 minutes suffices. The watchdog agent can then press Esc in tmux and send a message, maybe do some research to get it unstuck etc | |
| ▲ | minimaxir 2 days ago | parent | prev [-] | | Definitely have not seen that with Opus 4.5. | | |
| ▲ | manmal 2 days ago | parent [-] | | Neither have I, personally, but I’ve seen reports this can happen on very hard problems, where the goal just cannot be reached from a local optimum. Getting unstuck by trying something new is something a watchdog agent could prompt it. |
|
|