Remix.run Logo
AllegedAlec 6 days ago

> Coding agents have now got pretty good at checking themselves against reality, at least for things where they can run unit tests or a compiler to surface errors.

YMMV. I've seen Claude go completely batshit insane saying that tests all passed. Then I run them and I see 50+ failures. I copy the output tell him to fix it and he goes on his sycophantic apologia before spinning his wheels doing nothing and saying all tests are back to green.