Remix.run Logo
CuriouslyC 4 hours ago

Nah bro. I have a Claude Code hall of shame, where Sonnet gets derailed by the most trivial shit, and instead of finishing actual research code that's been clearly outlined for it (like, file by file level instructions), it creates a broken toy implementation with fake/simulated output ("XXX isn't working, the user wants me to YYY, let me just try a simpler approach...") and it'll lie about it in the final report, so if you aren't watching the log, sucks to be you.

I have an extensive array of tripwires, provenance chain verifications and variance checks in my code, and I have to treat Claude as adversarial when I let it touch my research. Not a great sign.