| ▲ | therobots927 2 hours ago | |
It’s a well known problem in higher mathematics that even if you’ve solved a problem, often the proofs are incredibly long and complex and require an extensive amount of time spent by peers to review it. It would be great if someone could explain to me how AI improves this situation. Even if AI thinks it’s solved a problem, unless the proof is incredibly efficient and well explained, it will be difficult to verify the correctness. One hallucination in 300 steps of logic is enough to destroy the entire proof. | ||
| ▲ | hilbertseries an hour ago | parent | next [-] | |
In 2012 Mochizuki claimed to have proved the abc conjecture by developing a new branch of mathematics. He was a respected mathematician, but the theories he had developed were so complex no one could determine if he was correct. It took six years until two number theorists dissected the proof and found a fatal flaw in it. | ||
| ▲ | jonahx an hour ago | parent | prev | next [-] | |
> It would be great if someone could explain to me how AI improves this situation. It's main utility is in the search step, not the verification step. The search is the bulk of the work and creativity. Separately, as the sibling commenter pointed out, it will likely get better at the verification step as well, with integrations of tools like Lean. > One hallucination in 300 steps of logic is enough to destroy the entire proof. The situation with human mathematicians is not much different. Eg, Wiles original proof of Fermat's Last Theorem contained errors found by reviewers, which he later repaired. | ||
| ▲ | skipkey an hour ago | parent | prev | next [-] | |
I would imagine that in the future AI will be doing proofs in Lean or whatever the successor to it, which gives you a pretty good confidence it’s correct. | ||
| ▲ | cdetrio 44 minutes ago | parent | prev [-] | |
The article addresses that, a formal verification layer provides a computational check that a proof is correct. There's an interesting wrinkle though. The formal definitions and statements need to be correct, i.e. need to faithfully translate the human-readable (informal) definitions and statements. A few months ago, Kevin Buzzard created an auto-formalization challenge: https://gist.github.com/kbuzzard/778bc714030b3e974ab5f403878... It's a handful of human-written lines, a skeleton that defines an "API surface". If you use AI to generate the rest of the code, the API surface (along with the Lean compiler) acts to guarantee that the whole generated solution (tens of thousands of lines) will be correct. The discussion thread about the challenge, which was solved a couple weeks ago, is here: https://leanprover.zulipchat.com/#narrow/channel/583336-Auto... Without the human-written or human-audited skeleton, you might get AI-generated slop Lean code that compiles, but contains nonsensical definitions/statements and thus vacuous proofs. Though from what I gather, recent models are much improved in their capability to generate correct statements and definitions and proofs, for both formal math and informal math. A discussion thread about a different auto-formalization effort without human-written API surface, with some correct output and some slop: https://leanprover.zulipchat.com/#narrow/channel/583336-Auto... | ||