Remix.run Logo
rurban 5 hours ago

> writing 10,000 lines of proof to prove a 100-line program was very expensive, and that's why it wasn't done.

We are not that silly. We are writing compilers (ie model checkers) which translate the source code to formal proofs. No cost at all, you just need to limit loop sizes and function call depths, to keep the cost of the proof down. And then extrapolate the little proof to the general proof.

pron 5 hours ago | parent [-]

Whatever the cost multiplier is, I see no reason why that same multiplier won't remain with AI.

Personally, I don't think that picture is quite accurate. Yes, there is a high cost multiplier for small programs, albeit perhaps not so prohibitive. But for large programs, that multiplier is, for most intents and purposes infinite, unless, perhaps, you have experts who know what's worth proving and what is not.

Anyway, I'd like to see that put to the test. Have an LLM write a 50-100KLOC program and prove all correctness properties - with the properties themselves approved by an expert human - and tell us what it cost. A colleague of mine stopped his AI proof experiment when he got an email from some functionary at the company to stop doing what he was doing with the model, because it was costing too much money.

win311fwg 40 minutes ago | parent [-]

> Have an LLM write a 50-100KLOC program and prove all correctness properties [...] and tell us what it cost.

Assuming the 50-100KLOC program is of real-world use and not something contrived for the sake of offering something to prove, it is unlikely that proving all correctness properties will be possible, fundamentally. So costs will be nothing — or infinite if you foolishly remain determined to try the impossible.

In the real world we restrict what properties we care about and what models we reason in. Some of those models are woven into the fabric of an LLM. I would think the cost multiplier in those cases is much lower for an LLM as compared to a human that doesn't have an inherit understanding and needs to give it thought. Wouldn't you?