| ▲ | zingar 2 hours ago | |
I’m uneasy having an agent implement several pages of plan and then writing tests and results only at the and of all that. It feels like getting a CS student to write and follow a plan to do something they haven’t worked on before. It’ll report, “Numbers changed in step 6a therefore it worked” [forgetting the pivotal role of step 2 which failed and as a result the agent should have taken step 6b, not 6a]. Or “there is conclusive evidence that X is present and therefore we were successful” [X is discussed in the plan as the reason why action is NEEDED, not as success criteria]. I _think _ that what is going wrong is context overload and my remedy is to have the agent update every step of the plan with results immediately after action and before moving on to action on the next step. When things seem off I can then clear context and have the agent review results step by step to debug its own work: “review step 2 of the results. Are the stated results confident with final conclusions? Quote lines from the results verbatim as evidence.” | ||
| ▲ | layer8 an hour ago | parent [-] | |
This is a bit like agile versus waterfall. | ||