| ▲ | furyofantares 9 hours ago | |
For small games I work on I make sure claude (well, codex cli) can produce screenshots of whatever screen it's working on and evaluate them. It has some instructions on using codex exec (claude -p) to use a clean instance for evaluation, so it can pass a screenshot and description of expectation and get a pass/fail and description of the failure. The main agent can also just view the image but for things with a clear pass/fail I prefer it invoke a clean context. | ||