The practical limit is the latency and inference cost. A full planning and validation loop burns a lot of tokens, and waiting for that cycle breaks flow compared to just writing the code.