I can read and understand e.g. Python, but I have seen subtle bugs that were hard to spot in code generated by AI. At least the last time I tried coding agents (mid 2025), it was often easier to write the code myself then play "spot the bug" with whatever was generated. I don't know anything about Lean, so I was wondering if there were similar pitfalls here.

▲

mkehrt a day ago | parent | next [-]

In this case the human written statement of the theorem is small. Can there be bugs? Absolutely! But it's essentially a few lines of code worth of thinking.

The lean proof checker then checks to make sure the proof actually proves the statement.

In this case an AI is generating the proof, but if it "compiles" it's correct. The only thing humans need to check is the statement to be proven.

(I don't know anything about this project but I've played around with lean and used other proof checkers more sesrisously).

	▲	rhdunn 17 hours ago \| parent [-]
		This is also an issue with non-AI and non-Lean proofs. Andrew Wiles' initial Fermat's Last Theorem proof initially had an error in it. That was spotted by peer review, fixed, and an updated proof was submitted.

▲

saghm 21 hours ago | parent | prev | next [-]

I don't disagree with you, but on the other hand, I feel the same way about code written by other humans, and that's not because they're necessarily worse at writing code than me, but because for code I've written myself, I've already spent time thinking about it, so I don't have to start from scratch when re-reading it. It's also not like I don't think I potentially write as many bugs as my coworkers either; it's just easier for me to catch my own up front as I'm coding than it is to catch theirs in code review. The two main differences are that I can have a more meaningful conversation with my coworkers about their approach, what bugs they might think are worth looking out for, etc. compared to an LLM (which in my experience will claim completely incorrect things about the code it wrote far more often than any engineer I've worked with even junior ones; the humans I've worked with have uniformly been able to report how confident they are in what they've produced being what they were tasked with without insane exaggerations), and that an LLM can produce a much higher volume of plausible-enough looking code in a given unit of time than most humans I've worked with. It's not obvious to me that these would be particularly severe problems in generating proofs though; unless the proof is so large that it would be infeasible to read through it in a reasonable amount of time, I would expect mathematicians to be able to make up for the lower quality conversations with the "author" by devoting more time to reading and thinking, or having someone else also read through the proof and talking through it with them. If anything, it's possible that the timeline for writing up a paper about the results might be better for some mathematicians than the equivalent amount of time most engineers have to spend reviewing code before the pressure to get it merged and move on to the next thing. (I'm aware that there is certainly pressure to get things published in academia, but I don't have firsthand experience, so I've tried to be intentional in how I've worded this to clarify that I want to avoid any assumptions about what the norms would be, but given the relatively wide range of time pressure that engineers might experience across the industry as a whole, I'd expect that at least some mathematicians might have some flexibility to spend extra time reading through an LLM-written proof, especially if it might be time they'd otherwise have to spend trying to come up with the proof themselves).

▲

Jaxan 20 hours ago | parent | prev | next [-]

If you want to check the statement, you only have to read the type. The proof itself you don’t have to read at all

▲

bryanrasmussen 16 hours ago | parent | prev [-]

As I understand it Lean is not a general purpose programming language, it is a DSL focused on formal logic verification. Bugs in a DSL are generally easier to identify and fix.

It seems one side of this argument desperately needs AI to have failed, and the other side is just saying that it probably worked but it is not as important as presented, that it is actually just a very cool working methodology going forward.

	▲	baq 9 hours ago \| parent [-]
		Lean 4 is a bit awkward, but workable as a general purpose programming language, it e.g. supports sockets (with a C module, but so does Python.)