Remix clone Hacker News

new | show | ask | jobs Github

	▲	alex43578 2 days ago
		And I think human written tests at that. If the LLM is blind to the failure mode X, does it know to reliably write a test to evaluate the behavior of X?