Why would you give this sort of work to a machine that can't be responsibly used without checking its output anyway?
It's not obvious to me that LLMs can't be made reliable.