▲ | CuriouslyC 3 days ago | |
Neat, I should extend this idea to emit signals when a model veers into "This is too hard, so I'll do a toy version that I masquerade as real code, including complete bullshit test cases so you will really have to dig to find out why something isn't working in production." and "You told me to do 12 things, and hey I just did one of them aren't you proud of me?" I've got a plan for a taskmasker agent that reviews other agent's work, but I hadn't figured out how to selectively trigger it in response to traces to keep it cheap. This might work if extended. |