Remix.run Logo
zozbot234 a day ago

Formal verification of program correctness is also (for obvious reasons) key to unlocking AI-driven synthesis (i.e. 'vibe' coding) of "correct" programs that will verifiably meet the given spec.

eru a day ago | parent | next [-]

Not all aspects of a spec can be formally encoded. But even half-way houses are good.

Eg you can give the vague spec 'build me a todo list app', but you can still formally prove that everything your app does finishes, or even that it finishes in reasonable time.

f1shy 21 hours ago | parent [-]

Yes. For sure we will never be able to 100% automate the whole SWE process. As you say, the first input is a human wish, and there comes the joke of the genie that always screw the wishes by leaving something obvious out, because not explicitly specified. Also I think at some point the halting problem will make some programs impossible to test. But it would so great, program in a loose syntax, but with more safety than Rust and Ada together

eru 25 minutes ago | parent [-]

> Also I think at some point the halting problem will make some programs impossible to test.

No, not at all. The halting problem isn't much of a problem here.

To elaborate: yes, it's pretty much impossible to decide whether an arbitrary programme will halt. But we aren't dealing with arbitrary programmes, you carefully have your agent craft programmes that are easy to prove correct.

There are languages available today whose type systems already only let you write terminating programmes. See eg https://news.ycombinator.com/item?id=32102203 the Dhall language. Or Agda or Lean itself (unless you specifically opt out via the 'partial' keyword. But it's trivial to check whether someone used 'partial'.)

If your agent write a programme that's not easy to prove to be terminating, you don't try harder to prove. You just flag that as an error and have the agent try again.

Just like as a human code reviewer you reject Pull Requests that are too complicated to understand: you don't even bother figuring out whether they are technically correct or not.

tomjen3 17 hours ago | parent | prev [-]

It will certainly help - but its an extremely high bar. Almost all formal verification of software today is "does this pass the typechecker"?.

Now this captures some errors, but it doesn't really capture high level ones (is this program guaranteed to not deadlock is a hard one), and it doesn't capture the one that is important for business purposes (does this do what the customer wants). That requirement is more important than correctness (vitness all the software that is described as "crap", but is nonetheless widely used).

I don't think this is a required key to unlocking vibe coding. That seems to be easy: does this provide business value? And there the answer seems roughly to be "yes".