I've worked with Claude Code/Codex, and they're amazing, but I have a question.

I have never worked with tools such as Lean, but if an LLM writes a proof with Lean, how can one be sure that the code itself (in the proof) is correct? Or is Lean so strict that this is unlikely to happen?

▲

enricozb 3 days ago | parent [-]

If it compiles, it typechecks. If it typechecks, the proof is correct. This is a consequence of the Curry-Howard correspondence [0].

From a pure mathematician's standpoint, the content of a proof is kind of (waving big hands here) irrelevant. It's more that the existence of a proof implies the theorem is true, and thus can be used to prove other theorems. Because of this "proof-irrelevance", a great foil to an LLM's propensity to hallucinate is something like Lean. In Lean, and in theorem provers oriented towards classical mathematics, the statement of the theorem being proved is what matters most. And due to Curry-Howard, the statement is equivalent to the signature/declaration of the function, which is what the human ought to verify, while Lean will verify the body.

[0]: https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspon....

▲

wild_egg 3 days ago | parent | next [-]

The thing I'm stuck on is that, yes, the proof is correct.

But what guarantees do I have that the proof or theorem is actually representative of the program I'm trying to specify?

And beyond that, what guarantees are there that the resulting end program actually adheres to either the specs or what has been proven?

Like it seems the goal here is `English -> Lean -> Code in whatever language` instead of just `English -> Code` but there's no way to be certain the translation step on either side of Lean has actually been done correctly.

I've done a lot of experiments around this so far this year and theorem provers can certainly help provide LLMs with more context but there is no tooling to actually verify that the resulting Python fully covers the corresponding Lean that was fed in.

	▲	hatefulmoron 3 days ago \| parent \| next [-]
		It's often said that formal verification moves bugs from the implementation to the specification, and there's also the problem of proving equality between the formally proven specification and the implementation. The first problem is just hard, the second can be approached by using a language like F* or Dafny that compiles to executable code. IIRC Amazon has had some success by using Lean for the specification and using fuzzing to test that the implementation matches.
	▲	alimw 3 days ago \| parent \| prev [-]
		You're using an LLM to translate Lean to Python? Care to tell us more? If you want to prove stuff about programs then Lean is a very capable tool. But it's intended that you write the program itself in Lean and not in some other language. I guess if you ask an LLM nicely it will attempt to translate a program that has already been written in Lean to one in Python, but why would you want to do that? I doubt it will run any faster.

▲

sealeck 3 days ago | parent | prev | next [-]

> If it compiles, it typechecks. If it typechecks, the proof is correct. This is a consequence of the Curry-Howard correspondence

I think this perhaps the most common misconception of how formal verification works. If the program typechecks, then you have a proof that the program (or often a model of the program) implements the specification. However, you still have to make sure that your _specification_ is correct! This is one of the hardest parts of formal verification!

	▲	johnisgood 2 days ago \| parent [-]
		Exactly. It does not mean your formal specification is correct (of the code and what it should do).

▲

omnicognate 3 days ago | parent | prev [-]

If it compiles it definitely proves something, which may or may not be what Claude tells you it proves.