Remix.run Logo
pron 5 hours ago

> It’s no longer just for safety-critical systems with the budget for specialized proof engineers. It’s for anyone who has a property worth proving

... and the budget to pay the AI to prove it.

I have quite a bit of experience with formal verification, but I don't understand the claim made in the article. As an aside, AI's ability to reliably prove the correctness of significantly large programs is still theoretical at this point, but let's assume it's possible. The claim in the article is that writing 10,000 lines of proof to prove a 100-line program was very expensive, and that's why it isn't done. But this increase in cost continues with AI! Whether you pay people to write the proofs or you pay an LLM to write the proof, you still have to pay for it. If I run a software company, saying that "verificaton is the AI's problem" isn't much different from saying, "it's the engineers' problem." Either way I'm not doing the work myself, but I am paying for it.

If the premise is that writing proofs was 100x more expensive than testing, I see nothing in this article to even suggest why it wouldn't still be 100x more expensive when an LLM is doing the work.

(BTW, the reason there aren't many specialised proof engineers is because they aren't in high demand; they're not being paid that much more than other engineers at a similar level)

rurban 5 hours ago | parent [-]

> writing 10,000 lines of proof to prove a 100-line program was very expensive, and that's why it wasn't done.

We are not that silly. We are writing compilers (ie model checkers) which translate the source code to formal proofs. No cost at all, you just need to limit loop sizes and function call depths, to keep the cost of the proof down. And then extrapolate the little proof to the general proof.

pron 5 hours ago | parent [-]

Whatever the cost multiplier is, I see no reason why that same multiplier won't remain with AI.

Personally, I don't think that picture is quite accurate. Yes, there is a high cost multiplier for small programs, albeit perhaps not so prohibitive. But for large programs, that multiplier is, for most intents and purposes infinite, unless, perhaps, you have experts who know what's worth proving and what is not.

Anyway, I'd like to see that put to the test. Have an LLM write a 50-100KLOC program and prove all correctness properties - with the properties themselves approved by an expert human - and tell us what it cost. A colleague of mine stopped his AI proof experiment when he got an email from some functionary at the company to stop doing what he was doing with the model, because it was costing too much money.

win311fwg 39 minutes ago | parent [-]

> Have an LLM write a 50-100KLOC program and prove all correctness properties [...] and tell us what it cost.

Assuming the 50-100KLOC program is of real-world use and not something contrived for the sake of offering something to prove, it is unlikely that proving all correctness properties will be possible, fundamentally. So costs will be nothing — or infinite if you foolishly remain determined to try the impossible.

In the real world we restrict what properties we care about and what models we reason in. Some of those models are woven into the fabric of an LLM. I would think the cost multiplier in those cases is much lower for an LLM as compared to a human that doesn't have an inherit understanding and needs to give it thought. Wouldn't you?