Remix.run Logo
mannykannot 3 days ago

I will just float this idea for consideration, as I cannot judge how plausible it is: Is it possible that LLMs or their successors will soon be able to make use of formal methods more effectively than humans? I don't think I am the only person surprised by how well they do at informal programming (On the other hand, there is a dearth of training material. Maybe a GAN approach would help here?)

danaris 3 days ago | parent | next [-]

LLMs, at least as they exist today, could be trained to be very good at producing text that looks like formal specifications, but there would be no way to guarantee that what they produced was a) correctly formed, or b) correctly specifying what was actually asked of them.

You might be able to get (a) better by requiring the LLM to feed its output through a solver and forcing it to redo anything that fails (this is where my knowledge of current LLMs kinda falls down), but (b) is still a fundamentally hard problem.

wakawaka28 2 days ago | parent | next [-]

I don't see why two LLMs together (or one, alternating between tasks) could not separately develop a spec and an implementation. The human input could be a set of abstract requirements, and both systems interact and cross-check each other to meet the abstract requirements, perhaps with some code/spec reviews by humans. I really don't see it ever working without one or more humans in the loop, if only to confirm that what is being done is actually what the human(s) intended. The humans would ideally be able to say as little as possible to get what they want. Unless/until we get powerful AGI, we will need to have technical human reviewers.

danaris 2 days ago | parent [-]

> I really don't see it ever working without one or more humans in the loop, if only to confirm that what is being done is actually what the human(s) intended.

That is precisely my point.

dontlaugh 3 days ago | parent | prev [-]

Some already use probabilistic methods to automatically produce proofs for specifications and code they wrote manually. It’s one of the few actually useful potential use cases for LLMs.

Yoric a day ago | parent | prev [-]

Use them?

Absolutely. See DeepSeek Prove, for instance. As far as I understand, it's basically a neurosymbolic system, which uses an LLM to write down proofs/code, then Lean to verify them, looping until it finds a proof/implementation that matches the specification, or it gives up.

Create them? Much harder.