Remix.run Logo
Rochus 5 hours ago

I think there is a misunderstanding about what is being back-translated.

We don't back-translate the proof steps (the thousands of intermediate logical derivations). That would indeed be verbose and useless.

We back-translate the specification: the Invariants, Guards, and Events.

For a traffic light system, we don't need the LLM to explain the 50 steps of predicate logic that prove inv3 holds. We just need it to translate inv3 itself:

    Formal: inv3: light_NS = Green ⇒ light_EW = Red

    Back-translation: 'Invariant: If the North-South light is Green, the East-West light MUST be Red.'
This isn't verbose; it's the exact concise summary of the system's safety rules. The 'verbosity' of handling edge cases (like the 'Allsome' example) only applies when the specification itself relies on subtle edge cases, in which case, being verbose is exactly what you want to prevent a hidden bug.
ratmice 4 hours ago | parent [-]

Definitions are built up layer upon layer like an onion too, with each step adding it's own invariants reducing the problem space.

I just feel like the street light example is an extremely small free standing example. Most things that I feel are worth the effort of proving end up huge. Forever formal verification languages were denigrated for being overly rigid and too verbose. I feel like translations into natural language can only increase that if they are accurate.

One thing I wish is this whole discussion was less intertwined with AI. The semantic gap has existed before AI, and will be run into again without AI. People have been accidentally proving the wrong thing true or false forever and will never stop with our without AI help.

At the very least we can agree that the problem exists, and while i'm skeptical of natural language as being anything but the problem we ran away from. At least you're trying something and exploring the problem space and that can only be cheered.

Rochus 4 hours ago | parent [-]

My bet is that AI changes the economics of that verbosity, making it cheap to generate and check those 'huge' definitions layer by layer. The next four years will show.