I think my last paragraph covered the idea that it's hard work for humans to validate as it is, even with tools the LLMs don't have.