Remix.run Logo
dworks a day ago

Yep. I'm saying the problem is not just about interpreting and validating the output. You need to also interpret the question, since its in natural language rather than code, so its not just twice as hard but strictly impossible to reach a 100% accuracy with an LLM because you can't define what is correct in every case.