Remix.run Logo
maronato 2 days ago

Claude trying to cheat its way through tests has been my experience as well. Often it’ll delete or skip them and proudly claim all issues have been fixed. This behavior seems to be intrinsic to it since it happens with both Claude Code and Cursor.

Interestingly, it’s the only LLM I’ve seen behave that way. Others simply acknowledge the failure and, after a few hints, eventually get everything working.

Claude just hopes I won’t notice its tricks. It makes me wonder what else it might try to hide when misalignment has more serious consequences.