| ▲ | 8note 12 hours ago | |
try other harnesses than codex. ive had more success with review tools, rather than the agent getting the code quality right the first time. current workflow 1. specs/requirements/design, outputting tasks 2. implementation, outputting code and tests 3. run review scripts/debug loops, outputting tasks 4. implement tasks 5. go back to 3 the quality of specs, tasks, and review scripts make a big difference one of the biggest things that gets the results better is if you can get a feedback loop in from what the app actually does back to the agent. good logs, being able to interact/take screenshots a la playwright etc guidelines and guardrails are best if theyre tools that the agent runs, or that run automatically to give feedback. | ||