| ▲ | aidenn0 a day ago |
| How do you verify that the AI translation to Lean is a correct formalization of the problem? In other fields, generative AI is very good at making up plausible sounding lies, so I'm wondering how likely that is for this usage. |
|
| ▲ | sibrahim a day ago | parent | next [-] |
| That's what's covered by the "assuming you have formalized the statement correctly" parenthetical. Given a formal statement of what you want, Lean can validate that the steps in a (tedious) machine-readable purported proof are valid and imply the result from accepted axioms. This is not AI, but a tiny, well reviewed kernel that only accepts correct formal logic arguments. So, if you have a formal statement that you've verified to represent what you are interested in by some other means, Lean can tell you whether the proof created by genAI is correct. Basically, there is a nigh infallible checker that won't accept incorrect hallucinations. |
| |
| ▲ | mjevans a day ago | parent | next [-] | | I think the question is, how can humans have verification that the problem statement was correctly encoded into that Lean specification? | | |
| ▲ | ndriscoll a day ago | parent | next [-] | | The problem statement is apparently > Let C>0 and ϵ>0 be sufficiently small. Are there infinitely many integers a,b,n with a≥ϵn and b≥ϵn such that a!b!∣n!(a+b−n)! and a+b>n+Clogn? Which seems like it's the type of thing you give as a homework problem to state formally in an intro class. | | |
| ▲ | Jweb_Guru 11 hours ago | parent | next [-] | | Yeah people dramatically overestimate the difficulty of getting one's definitions correct for most problems, especially when you are doing an end to end proof rather than just axiomatizing some system. They are still worth looking at carefully, especially for AI-generated proofs where you don't get the immediate feedback that you do as a human when something you expect to be hard goes through easily, but contrary to what seems to be popular belief here they are generally much easier to verify than the corresponding proof (in the case of formally verified software, the corresponding analogy is verifying that the spec is what you want vs. verifying that the program matches the spec; the former is generally much easier). | |
| ▲ | oh_my_goodness 11 hours ago | parent | prev | next [-] | | Everyone has a different perspective, based on their math background. From the OP's perspective, the formalization of this problem statement was apparently worth talking about. On the other hand, for you it's just a homework problem that belongs in an intro class. Let's just be generous and try to accept these differences. | |
| ▲ | ccppurcell a day ago | parent | prev [-] | | Are you an expert? Not gatekeeping here but I have no intuition for what is easy or hard to formalise. A lot of very simply stated graph theoretical results are apparently extremely hard to formalise. | | |
| ▲ | impendia 17 hours ago | parent | next [-] | | > Are you an expert? I can't speak for ndriscoll, but I am a university math professor with extensive experience teaching these sorts of topics, and I agree with their comment in full. You are right that some (other) statements are harder to formalize than they look. The Four Color Theorem from graph theory is an example. Generally speaking, discrete math, inequalities, for all/there exists, etc. are all easy to formalize. Anything involving geometry or topology is liable to be harder. For example, the Jordan curve theorem states that "any plane simple closed curve divides the plane into two regions, the interior and the exterior". As anyone who has slogged through an intro topology book knows, statements like this take more work to make precise (and still more to prove). | | | |
| ▲ | codeflo 19 hours ago | parent | prev | next [-] | | > apparently When someone takes the time to explain undergrad-level concepts in a comment, responding with "are you an expert?" is a level of skepticism that's bordering on hostile. The person you're responding to is correct, it's rare that the theorem statement itself is particularly hard to formalize. Whatever you read likely refers to the difficulty of formalizing a proof. | | |
| ▲ | freehorse 16 hours ago | parent | next [-] | | To be fair, the comment did not explain any concept that I can see, or why this statement is simple. It gave the statement and said it was simple to formalise. It does seem simple enough to me (basic arithmetic statement with a few variables and a bunch of quantifiers) but if somebody has no expertise/intuition, I think it is a fair question, without any hostile intent assumed. | |
| ▲ | IsTom 18 hours ago | parent | prev | next [-] | | > it's rare that the theorem statement itself is particularly hard to formalize That's very dependent on the problem area. For example there's a gap between high school explanation of central limit theorem and actual formalization of it. And when dealing with turing machines sometimes you'll say that something grows e.g. Omega(n), but what happens is that there's some subsequence of inputs for which it does. Generally for complexity theory plain-language explanations can be very vague, because of how insensitive the theory is to small changes and you need to operate on a higher level of abstraction to have a chance to explain a proof in reasonable time. | | |
| ▲ | zozbot234 16 hours ago | parent [-] | | Yes, if the theorem statement itself is "hard to formalize" even given our current tools, formal foundations etc. for this task, this suggests that the underlying math itself is still half-baked in some sense, and could be improved to better capture the concepts we're interested in. Much of analysis-heavy math is in that boat at present, compared to algebra. |
| |
| ▲ | kelipso 13 hours ago | parent | prev | next [-] | | Lol it’s weird seeing high school redditors saying gatekeeping and are you an expert in the same thread as university professors, all talking about the same topic. But I guess that’s HN for you. | |
| ▲ | phyzome 15 hours ago | parent | prev [-] | | I think it was a fine question to ask in the context of a discussion of epistemology. |
| |
| ▲ | tetha 14 hours ago | parent | prev | next [-] | | As the sibling comment says, in my experience, the difficulty of formalizing the problem varies greatly between different areas. Probability theory is another notorious field in which modelling a problem correctly can be very subtle and difficult. On the other hand, many problems in number theory and discrete structures tend to be rather simple to formalize. If you want to take your own look at that, I'd recommend to check out the lean games[1]. I'd say after the natural numbers game, you most likely know enough lean to write that problem down in lean with the "sufficiently small" being the trickiest part. 1: https://adam.math.hhu.de/ | |
| ▲ | dooglius 21 hours ago | parent | prev | next [-] | | I think you may be confusing specification of the problem and the formalization of the proof. | |
| ▲ | 21 hours ago | parent | prev | next [-] | | [deleted] | |
| ▲ | schrodinger 10 hours ago | parent | prev [-] | | Voters: please reconsider your ups and downs. I think the “Are you an expert” question triggered a lot of downvotes when it was in fact asked in good faith to judge the person’s perspective of easy and hard. | | |
| ▲ | NetMageSCW 8 hours ago | parent [-] | | And I would say there is no way to ask that question in good faith. (Tedious proof by cases left as an exercise for readers.) The correct question would have been, does anyone else agree with the statement. In this particular case, the amount knowledge needed (of e.g. Lean language, math and Erdos problems) means any credible statement about the difficulty requires an expert. | | |
|
|
| |
| ▲ | johncolanduoni a day ago | parent | prev | next [-] | | They can read the statement, and the definitions that the statement references. If everything it references is in a well-tread part of the Lean library, you can have pretty high confidence in a few minutes of going over the syntax. | |
| ▲ | saghm 21 hours ago | parent | prev | next [-] | | Isn't that kind of a general problem with proofs, even when they're written by humans? There's nothing stopping someone from accidentally writing their own Lean proof that has slightly different semantics than an English version of the same proof, or even for their English proof to subtly miss something important or make an incorrect logical leap. This seems like a bit of a double standard, although maybe there's nuance here I'm missing. | | |
| ▲ | teiferer 20 hours ago | parent [-] | | One nuance you are missing is that the discussion is about formalizing the statement (the theorem), not the proof. The latter is what the article is about, but that doesn't suffice if you can't trust that the statement is also correctly formalized. These are the two main problems: 1. Formalizing a theorem. 2. Finding a formal proof. Part 2 is where AI could help as proof search is full of heuristics. That's also how humans find proofs and is one of the main skills of a mathematician. The formal proof can then be machine checked with well known and mature techniques not involving AI. Part 1 is the part that's missing and will always be hard. It's also the issue with formal verification of programs for which correctness criteria are often very complex and it's easy to mess up the formalization, so that even if you trust the proof, you can't trust that it proves the right thing. |
| |
| ▲ | digikata 17 hours ago | parent | prev | next [-] | | To borrow some definitions from Systems engineering for verification and validation, this question is one of validation. Verification is performed by Lean and spec syntax and logic enforcement. But Validation is a question of is if the Lean spec encodes a true representation of the problem statement (was the right thing specced). Validation at highest levels is probably an irreplaceable human activity. Also, on the verification side - there could also be a window of failure that Lean itself has a hidden bug in it too. And with automated systems that seek correctness, it is slightly elevated that some missed crack of a bug becomes exploited in the dev-check-dev loop run by the AI. | |
| ▲ | 17 hours ago | parent | prev | next [-] | | [deleted] | |
| ▲ | threatofrain 14 hours ago | parent | prev | next [-] | | It’s far easier for Lean because the human has to read very little compared to generating whole programs. | |
| ▲ | mrtesthah a day ago | parent | prev | next [-] | | They probably need to be able to read and understand the lean language. | | |
| ▲ | aidenn0 a day ago | parent [-] | | I can read and understand e.g. Python, but I have seen subtle bugs that were hard to spot in code generated by AI. At least the last time I tried coding agents (mid 2025), it was often easier to write the code myself then play "spot the bug" with whatever was generated. I don't know anything about Lean, so I was wondering if there were similar pitfalls here. | | |
| ▲ | mkehrt a day ago | parent | next [-] | | In this case the human written statement of the theorem is small. Can there be bugs? Absolutely! But it's essentially a few lines of code worth of thinking. The lean proof checker then checks to make sure the proof actually proves the statement. In this case an AI is generating the proof, but if it "compiles" it's correct. The only thing humans need to check is the statement to be proven. (I don't know anything about this project but I've played around with lean and used other proof checkers more sesrisously). | | |
| ▲ | rhdunn 17 hours ago | parent [-] | | This is also an issue with non-AI and non-Lean proofs. Andrew Wiles' initial Fermat's Last Theorem proof initially had an error in it. That was spotted by peer review, fixed, and an updated proof was submitted. |
| |
| ▲ | saghm 21 hours ago | parent | prev | next [-] | | I don't disagree with you, but on the other hand, I feel the same way about code written by other humans, and that's not because they're necessarily worse at writing code than me, but because for code I've written myself, I've already spent time thinking about it, so I don't have to start from scratch when re-reading it. It's also not like I don't think I potentially write as many bugs as my coworkers either; it's just easier for me to catch my own up front as I'm coding than it is to catch theirs in code review. The two main differences are that I can have a more meaningful conversation with my coworkers about their approach, what bugs they might think are worth looking out for, etc. compared to an LLM (which in my experience will claim completely incorrect things about the code it wrote far more often than any engineer I've worked with even junior ones; the humans I've worked with have uniformly been able to report how confident they are in what they've produced being what they were tasked with without insane exaggerations), and that an LLM can produce a much higher volume of plausible-enough looking code in a given unit of time than most humans I've worked with. It's not obvious to me that these would be particularly severe problems in generating proofs though; unless the proof is so large that it would be infeasible to read through it in a reasonable amount of time, I would expect mathematicians to be able to make up for the lower quality conversations with the "author" by devoting more time to reading and thinking, or having someone else also read through the proof and talking through it with them. If anything, it's possible that the timeline for writing up a paper about the results might be better for some mathematicians than the equivalent amount of time most engineers have to spend reviewing code before the pressure to get it merged and move on to the next thing. (I'm aware that there is certainly pressure to get things published in academia, but I don't have firsthand experience, so I've tried to be intentional in how I've worded this to clarify that I want to avoid any assumptions about what the norms would be, but given the relatively wide range of time pressure that engineers might experience across the industry as a whole, I'd expect that at least some mathematicians might have some flexibility to spend extra time reading through an LLM-written proof, especially if it might be time they'd otherwise have to spend trying to come up with the proof themselves). | |
| ▲ | Jaxan 20 hours ago | parent | prev | next [-] | | If you want to check the statement, you only have to read the type. The proof itself you don’t have to read at all | |
| ▲ | bryanrasmussen 16 hours ago | parent | prev [-] | | As I understand it Lean is not a general purpose programming language, it is a DSL focused on formal logic verification. Bugs in a DSL are generally easier to identify and fix. It seems one side of this argument desperately needs AI to have failed, and the other side is just saying that it probably worked but it is not as important as presented, that it is actually just a very cool working methodology going forward. | | |
| ▲ | baq 9 hours ago | parent [-] | | Lean 4 is a bit awkward, but workable as a general purpose programming language, it e.g. supports sockets (with a C module, but so does Python.) |
|
|
| |
| ▲ | luckydata 21 hours ago | parent | prev [-] | | the same way you verify that any other program compiles? I don't understand the question tbh, it seems self evident. | | |
| ▲ | NetMageSCW 7 hours ago | parent [-] | | Compiling isn’t sufficient because it doesn’t tell you if the program matches the specification. A program that always says the temperature is 80 F will compile but is a terrible solution to what is the temperature outside at this location right now. |
|
| |
| ▲ | oh_my_goodness 11 hours ago | parent | prev | next [-] | | >That's what's covered by the "assuming you have formalized the statement correctly" parenthetical. Sure. But it's fair to ask how to validate that assumption. | | |
| ▲ | fsmv 11 hours ago | parent [-] | | Skilled humans must understand the problem and write the theorem statement. |
| |
| ▲ | bee_rider 10 hours ago | parent | prev | next [-] | | How hard is it to go back to English, from Lean? Just as hard as going from English to Lean? If it is easier to convert backwards, maybe the AI can at least describe what the equations mean… | |
| ▲ | f1shy a day ago | parent | prev | next [-] | | > This is not AI, A little bit nitpicking, but according to books like AIMA that is indeed AI. In the first chapter even any control system is classified as AI. Because of the reasons stated in the 1st chapter, I totally agree with the authors. The whole system is AI. That part is a verifier in a chain of “suggestions/instict -> verifier” like used in neurosymbolic systems for automated driving, for example. | |
| ▲ | heroprotagonist 20 hours ago | parent | prev | next [-] | | Soo, it can definitively tell you that 42 is correct Answer to the Ultimate Question of Life, The Universe, and Everything. It just can't tell you if you're asking the right question. | | |
| ▲ | Someone 19 hours ago | parent | next [-] | | No, it can tell you that 42 is the answer to (some lean statement), but not what question that lean statement encodes. | |
| ▲ | davidwritesbugs 18 hours ago | parent | prev [-] | | I recently learnt that Douglas Adams wrote code and the ultimate answer was '*' - 42 in ascii |
| |
| ▲ | exe34 16 hours ago | parent | prev | next [-] | | the argument here is that: 1. you write a proof in English that there is an infinite number of primes.
2. the llm writes 2+2=4 in lean.
3. lean confirms that this is correct and it's impossible that this proof is wrong. | | |
| ▲ | NetMageSCW 7 hours ago | parent [-] | | You missed a whole section - a person creates a Lean formalization of #1 and Lean promptly says the AI proof is wrong because it doesn’t prove that formal problem statement. The question is in the person (or AI) creating the formal problem statement - how do you know it represents the problem the proof is supposed to be for? And the answer is for people in the field, in this case, formalizing the problem and verifying the formalization is easy. It is like generating an public key versus factoring it. | | |
| ▲ | exe34 6 hours ago | parent [-] | | I thought that's what I was trying to express between lines 1 and 2 above, but I may have failed to get it across. my understanding is that the danger is that the llm will create a proof that is correct but isn't about what the person thinks he's proving? |
|
| |
| ▲ | rurban 20 hours ago | parent | prev [-] | | Lean is doing logical AI, the classical AI part. Aristotle is doing the matching AI part, the modern LLM approach, previously called fuzzy logic. Both are AI. | | |
| ▲ | teiferer 20 hours ago | parent [-] | | Calling Lean "AI" is quite a stretch. Though I'm also in the camp that dislikes the inflationary use of "AI" for LLMs, so I have sympathies for your viewpoint. | | |
| ▲ | baq 19 hours ago | parent [-] | | Finding a path in a maze was AI once. | | |
| ▲ | chpatrick 13 hours ago | parent [-] | | I think that definition is pretty obsolete for the last 20 years. To me "AI" is machine learning, statistical algorithms trained on data. That's not true for Lean. | | |
| ▲ | baq 9 hours ago | parent [-] | | So basically anything we don’t know how to write an algorithm for? I see where you’re coming from - but at the same time it’s actually an AI meme and smells of permanently moving goalposts. |
|
|
|
|
|
|
| ▲ | svat 9 hours ago | parent | prev | next [-] |
| It may help to look at this example concretely: The natural-language statement of the problem is (from https://www.erdosproblems.com/728): > Let C>0 and ϵ>0 be sufficiently small. Are there infinitely many integers a,b,n with a≥ϵn and b≥ϵn such that a!b!∣n!(a+b−n)! and a+b>n+Clogn? The Lean-language statement of the problem (which can be done either by hand or by AI) is (from https://github.com/plby/lean-proofs/blob/f44d8c0e433ab285541...): ∀ᶠ ε : ℝ in [>] 0, ∀ C > (0 : ℝ), ∀ C' > C,
∃ a b n : ℕ,
0 < n ∧
ε * n < a ∧
ε * n < b ∧
a ! * b ! ∣ n ! * (a + b - n)! ∧
a + b > n + C * log n ∧
a + b < n + C' * log n
Yes on the one hand, one needs to know enough about Lean to be sure that this formulation matches what we intend, and isn't stating something trivial. But on the other hand, this is not as hard as finding an error on some obscure line of a long proof.(There's also an older formulation at https://github.com/google-deepmind/formal-conjectures/blob/f... but the new one is more in the spirit of what was intended: see the discussion starting at https://www.erdosproblems.com/forum/thread/728#post-2196 which gives a clear picture, as of course does Tao's thread in the OP that summarizes this discussion.) |
| |
| ▲ | kovek 8 hours ago | parent [-] | | I'm wondering how do people come up with these mathematical challenges? |
|
|
| ▲ | maxwells-daemon a day ago | parent | prev | next [-] |
| For this reason, when we announce results on e.g. the IMO, we formalize the statements by hand and inspect the proofs carefully to ensure they capture the full spirit of the problem. However, there are some good heuristics. If you expect a problem to be hard and the proof is very short, you've probably missed something! |
| |
|
| ▲ | maccam912 16 hours ago | parent | prev | next [-] |
| To answer the question a different way, I think you are asking how we know the proof actually matches the description the human provided? And I'd say we can't know for sure, but the idea is that you can pretty concisely write and check yourself that the problem is accurate, i.e. "There are an infinite number of primes" or whatever, and then even if an LLM goes off and makes up a lean proof wildly different from your description, if lean says the proof is valid then you have proven the original statement. I guess in theory the actual proof could be way different than what you thought it would be, but ultimately all the logic will still check out. |
|
| ▲ | rtpg a day ago | parent | prev | next [-] |
| I feel like even outside of AI translation, formalization not capturing the spirit of what the informal description was provided is always a risk. This is also a big risk when trying to prove code correctness: "prove this algo works" means you gotta define "works" along certain axes, and if you're very unlucky you might have a proof that exploits the uncertainty around a certain axis. |
|
| ▲ | jlouis a day ago | parent | prev | next [-] |
| The statement is something you provide. It's the search you can have the LLM do. If this works for math it will immediately make code way higher quality via the same tools. |
|
| ▲ | roenxi a day ago | parent | prev [-] |
| You're looking for the practical answer, but philosophically it isn't possible to translate an informal statement into a formal one 'correctly'. It is informal, ie, vaguely specified. The only certain questions are if the formal axioms and results are interesting which is independent of the informal formalisation and that can only be established by inspecting the the proof independently of the informal spec. |
| |
| ▲ | wizzwizz4 a day ago | parent [-] | | Philosophically, this is not true in general, but that's for trivial reasons: "how many integers greater than 7 are blue?" doesn't correspond to a formal question. It is absolutely true in many specific cases. Most problems posed by a mathematician will correspond to exactly one formal proposition, within the context of a given formal system. This problem is unusual, in that it was originally misspecified. | | |
| ▲ | red75prime 19 hours ago | parent [-] | | I suppose there's no formally defined procedure that accepts a natural language statement and outputs either its formalization or "misspecified". And "absolutely true" means "the vast majority of mathematicians agree that there's only one formal proposition that corresponds to this statement". | | |
| ▲ | varjag 18 hours ago | parent [-] | | I think you suppose wrong. A statement like "the area of the square whose side is the hypotenuse is equal to the sum of the areas of the squares on the other two sides" doesn't seam out of reach of an algorithmic procedure like a classical NLP. | | |
| ▲ | red75prime 18 hours ago | parent [-] | | Sure, we can write a procedure that recognizes some formal grammar, which intersects with the natural language. Defining the formal grammar that fully captures the current natural language understanding of the mathematical community is a bit harder. | | |
| ▲ | wizzwizz4 13 hours ago | parent [-] | | This problem was even worse: it's matched by the formal grammar, but the naïve formalisation has a trivial answer, so it is clearly not what was intended. | | |
| ▲ | NetMageSCW 7 hours ago | parent [-] | | That clearly may be doing some heavy lifting. It is assumed that trivial answer wasn’t what was intended for the problem, but unless someone asked Erdos, I don’t think we know. | | |
| ▲ | wizzwizz4 5 hours ago | parent [-] | | Considering that he did some work towards the problem, tackling non-trivial cases, I think we do know. There's no way he wouldn't have perceived trivial solutions at some point. |
|
|
|
|
|
|
|