If your LLM is producing a proof that can be checked by another program, then there’s nothing wrong with their reliability. It’s just like playing a game whose rules are a logical system.