Remix.run Logo
jsenn 5 hours ago

> isn't the verification code going to be sloppy as well

The beauty of formal methods is it doesn't matter if your proof is sloppy. As long as it passes verification, it is correct. And unlike in pure math, the proof that a software system is correct is usually a huge mess of special cases, loop invariants, proofs by induction, and boilerplate that requires a large amount of human labour while providing no insight.

Proofs are also brittle: a tiny change in the code can force you to throw your proof away and start from scratch.

To me, the exciting thing about formal methods in the LLM era is it allows humans to offload the difficult and tedious work of writing proofs to a computer. Taken to an extreme, the human could live entirely in the world of a formal specification, and the LLM could generate 100% of the code. The code may be a mess, but if the system proves it satisfies the spec then it can't be wrong.

pron 4 hours ago | parent | next [-]

The problem is that generating either code or proofs with LLMs is very expensive, and generating good proofs (I don't mean elegant, I mean proving the most important properties) is probably not very fast, either. Reducing the verification time of a program from 100 years to 10 years or the cost from $1bn to $100m is still not practical enough to become truly mainstream.

Things can be improved when people help guide and focus the LLMs, but these people still need to be formal methods experts.

odyssey7 4 hours ago | parent | prev [-]

So, formal methods produce runnable systems, but communication remains the challenge.

If a formal spec is messy, then it's a proof of ... what, exactly?

A formal specification that bridges tech and product, that lets non-technical contributors read and discuss all the logical nuances, directly as operational code, at product's level of abstraction of interest, would transform a lot.

It's no longer a challenge to create code, it's a challenge to create business requirements and translate them into systems.