| ▲ | gaigalas 4 days ago | |
> Make your coding agent prove it first Agents love to cheat. That's an issue I don't see a horizon for change. Here's Opus 4.5 trying to cheat its way out of properly implementing compatibility and cross-platform, despite the clear requirements: https://gist.github.com/alganet/8531b935f53d842db98157e1b8c0... > Should popen handles work with fgets/fread/fwrite? PHP supports this. Option A: Create a minimal pipe_io_stream device / Option B: Store FILE* in io_private with a flag / Option C: Only support pclose, require explicit stream wrapper for reads. If I asked for compatibility, why give me options that won't fully achieve it? It actually tried to "break check" my knowledge about the interpreter (test me if I knew enough to catch it), and proposed shortcuts all the way through the chat. I don't want to have to pepper my chats with variations on "don't cheat". I mean, I can do it, but it seems like boilerplate. I wish I had some similar testing-related chats to share. Agents do that all the time. This is the major blocker right now for AI-assisted automated verification, and one of the reasons why this isn't well developed beyond general directions (give it screenshots, make it run the command, etc). | ||