▲ | UncleMeat 9 hours ago | |
A couple weeks ago I was working on a parser that needed to handle a new file format that was a large change from existing formats. I wanted some test inputs, both valid and invalid cases. I had the codebase of a tool chain that I knew could generate valid files, some public documentation about the new file format, and my parser codebase. A good problem to throw at AI, I thought. I handed the tools to a SOTA model and asked it to generate me some files. Garbage. Some edits to the prompts and I still get garbage. Okay, that's pretty hard to generate a binary with complex internal structure directly. Let's ask it to tell me how to make the toolchain generate these for me. It gives me back all sorts of CLI examples. None work. I keep telling it what output I am getting and how it differs from what I want. Over and over it fails. I finally reach out to somebody on the toolchain team and they tell me how to do it. Great, now I can generate some valid files. Let's try to generate some invalid ones to test error paths. I've got a file. I've got the spec. I ask the LLM to modify the file to break the spec in a single way each time and tell me which part of the spec it broke each time. Doesn't work. Okay. I ask it to write me a python program that does this. Works a little bit, but not consistently and I need to inspect each output carefully. Finally I throw my files into a coverage guided fuzzer corpus and over a short period of time it's generated inputs that have excellent branch coverage for me. What would effective have looked like to you in this situation? |