Remix clone Hacker News

new | show | ask | jobs Github

	▲	cadamsdotcom 4 days ago
		Don’t give up on TDD. I’ve invested hundreds of hours in process and tooling, and can now ship major features with tests in record time with Claude Code. You have to coach it in TDD - no matter how much you explain in CLAUDE.md. That’s part because “a test that fails because the code isn’t written yet” is conceptually very similar to “a test that passes without the code we’re about to write” and is also similar to “a test that asserts the code we’re about to write is not there”. You have to watch closely to make sure it produces the first thing. Why does it keep getting confused? You can’t blame it really. When two things are conceptually similar, models need lots of examples to distinguish between them. If the set of samples is sparse the model is likely to jump the small distance from a concept to similar ones. So, you have to accept this as how Claude 4 works, keep it on a short leash, keep reminding it that it must watch the test fail, ask it if the test failed for the right reason (not some setup issue), and THEN give it permission to write the code. The result is two mirror copies of your feature or fix: code and tests. Reviewing code and tests together is pleasant because they mirror one another. The tests forever ensure your feature works as described, no manual testing needed, no regressions. And the model knows all the tricks to make your tests really beautiful. TDD is the check and balance missing from most people’s agentic software dev process.
	▲	jonstewart 3 days ago \| parent [-]
		Oh, I will never give up on TDD. And the assistants are great at helping to write tests, and especially analyzing the tests you have and suggesting others for edge cases. But I have repeatedly seen claude get hung up on TDD itself and I've tried lots of different prompts/directions. It runs into a problem and inevitably runs ever more complicated shell commands and creating weird temp input files than sticking to "cargo test" and addressing the failing test. Since I need to review the agent's code, I'd much prefer it to use a workflow like a human, with a progression of small commits following TDD--much easier to review the code then. If it's just splatting up big diffs, then it makes review harder, and that offsets any productivity gains.