Does it account for errors generated from Runtime bugs which caused rerunning of prompts?
Because that’s what happened in the real world when generating a bunch of untyped Python code.