I've had some luck taming prompt introspection by spawning a critic agent that looks at the plan produced by the first agent and vetos it if the plan doesn't match the user's intentions. LLMs are much better at identifying rule violations in a bit of external text than regulating their own output. Same reason why they generate unnecessary comments no matter how many times you tell them not to.

▲

miohtama 4 hours ago | parent [-]

How does one integrate critic agent to a Codex/Claude?

	▲	bentcorner 3 hours ago \| parent \| next [-]
		I just say something like "spawn an agent to review your plan" or something to that effect. "Red/green TDD" is apparently the nomenclature: https://simonwillison.net/guides/agentic-engineering-pattern... I've also found it to be better to ask the LLM to come up with several ideas and then spawn additional agents to evaluate each approach individually. I think the general problem is that context cuts both ways, and the LLM has no idea what is "important". It's easier to make sure your context doesn't contain pink elephants than it is to tell it to forget about the pink elephants.
	▲	AlotOfReading 3 hours ago \| parent \| prev [-]
		You can just say spawn an agent as the sibling says. I didn't find that reliable enough, so I have a slightly more complicated setup. First agent has no permissions except spawning agents and reading from a single directory. It spawns the planner to generate the plan, then either feeds it to the critic and either spawns executors or re-runs the planner with critic feedback. The planner can read and write. The critic agent can only read the input and outputs accept/reject with reason. This is still sometimes flaky because of the infrastructure around it and ideally you'd replace the first agent with real code, but it's an improvement despite the cost.