I've been doing differential testing in Gemini CLI using sub-agents. The idea is:

1. one agent writes/updates code from the spec

2. one agent writes/updates tests from identified edge cases in the spec.

3. a QA agent runs the tests against the code. When a test fails, it examines the code and the test (the only agent that can see both) to determine blame, then gives feedback to the code and/or test writing agent on what it perceives the problem as so they can update their code.

(repeat 1 and/or 2 then 3 until all tests pass)

Since the code can never fix itself to directly pass the test and the test can never fix itself to accept the behavior of the code, you have some independence. The failure case is that the tests simply never pass, not that the test writer and code writer agents both have the same incorrect understanding of the spec (which is very improbable, like something that will happen before the heat death of the universe improbable, it is much more likely the spec isn't well grounded/ambiguous/contradictory or that the problem is too big for the LLM to handle and so the tests simply never wind up passing).

▲

jeremyjh 6 hours ago | parent [-]

Where is the interface defined ? If it is just the coder reading the test it can hard code specific cases based on the test setup/fixture data.

	▲	seanmcdirmid 5 hours ago \| parent [-]
		There is a specification and the interface is defined from that. The coder never gets to see the test.