Remix clone Hacker News

new | show | ask | jobs Github

	▲	regularfry 2 hours ago
		The difference in outcome isn't that big but yes, you need to be more rigorous. For instance I've found that the Kimi K2.5 and K2.6 models will comment out failing tests rather than fix a problem they just caused (mistaking them for "pre-existing failures"), so you need to specifically make commented-out tests break the build. I've not personally had that problem with any of the Anthropic or OpenAI models.