I'm not making excuses for LLMs. I'm saying that when you have a non-deterministic system for which you have to evaluate all the output for correctness due to its impredictability, it is a practically impossible task.