Remix.run Logo
RS-232 4 hours ago

Has anyone had success using 2 agents, with one as the creator and one as an adversarial "reviewer"? Is the output usually better or worse?

mapontosevenths 3 hours ago | parent | next [-]

This is how its meant to be done. Usually with the reviewer being the stronger model.

That said, with both the test driven development this post describes and the reviewer model (its best to do both) you have to provide an escape hatch or out for the model. If you let the model get inescapably stuck with an impossible test or constraints it will just start deleting tests or rewriting the entire codebase in rust or something.

My escape hatch is "expert advice". I let the weak LLM phone a friend when its stuck and ask a smarter LLM for assistance. Its since stopped going crazy and replacing all my tests with gibberish... mostly.

sanxiyn 3 hours ago | parent | prev | next [-]

That works well. Anthropic wrote a writeup on it.

https://www.anthropic.com/engineering/harness-design-long-ru...

esafak 4 hours ago | parent | prev | next [-]

This is routine. We have Gemini (which is not our coding model) review our PRs and it genuinely catches mistakes. Even using the same model as the creator, without its context to bias it, would probably catch many mistakes.

peytongreen_dev 2 hours ago | parent | prev [-]

[dead]