The real world success they report reminds me of Simon Willison’s Red Green TDD: https://simonwillison.net/guides/agentic-engineering-pattern...

> Instead of taking a stab in the dark, Leanstral rolled up its sleeves. It successfully built test code to recreate the failing environment and diagnosed the underlying issue with definitional equality. The model correctly identified that because def creates a rigid definition requiring explicit unfolding, it was actively blocking the rw tactic from seeing the underlying structure it needed to match.

▲

jatins an hour ago | parent | next [-]

If Agent is writing the tests itself, does it offer better correctness guarantees than letting it write code and tests?

▲

skanga 3 hours ago | parent | prev [-]

TDD == Prompt Engineering, for Agentic coding tasks.

	▲	_boffin_ an hour ago \| parent [-]
		Wild it’s taken people this long to realize this. Also lean tickets / tasks with all needed context to complete the task, including needed references / docs, places to look in source, acceptance criteria, other stuff.