| ▲ | storus 8 hours ago | |||||||
Wasn't the best practice to run one model/coding agent that writes the code and another one that reviews it? E.g. Claude Code for writing the code, GPT Codex to review/critique it? Different reward functions. | ||||||||
| ▲ | 8note 3 hours ago | parent | next [-] | |||||||
even in one agent, a different starting prompt will have you tracing a very different path through the model. maybe it still sends you to the same valley, but there's so many parameters and dimensions that i dont think its very likely without also being correct | ||||||||
| ▲ | xandrius 5 hours ago | parent | prev | next [-] | |||||||
I think people are misunderstanding reward functions and LLMs. LLMs don't actually have a reward system like some other ML models. | ||||||||
| ||||||||
| ▲ | throwatdem12311 3 hours ago | parent | prev [-] | |||||||
It’s superstition that using a different slop generator to “review” the slop from a different brand of slop generator somehow makes things better. It’s slop all the way down. | ||||||||