Interesting article, thanks. There is indeed a "semantic gap". However, there is also a practical solution: bidirectional LLM translation. You can verify the formal specification by back-translating it to natural language with another LLM session, allowing human review at the intent level rather than requiring expertise in e.g. Event-B syntax (see https://rochuskeller.substack.com/p/why-rust-solves-a-proble...). This addresses the concern about "mis-defining concepts" without requiring the human to be a formal methods expert. The human can review intent and invariants in natural language, not proof obligations. The AI handles the mathematical tedium while the human focuses on domain correctness, which is exactly where human expertise belongs.

▲ viraptor 6 hours ago | parent | next [-]

> there is also a practical solution: bidirectional LLM translation

It's a solution only if the translation is proven correct. If not, you're in the same place as you started.

▲ ratmice 8 hours ago | parent | prev [-]

why do we invent these formal languages except to be more semantically precise than natural language? What does one gain besides familiarity by translation back into a more ambiguous language?

Mis-defining concepts can be extremely subtle, if you look at the allsome quantifier https://dwheeler.com/essays/allsome.html you'll see that these problems predate AI, and I struggle to see how natural language is going to help in cases like the "All martians" case where the confusion may be over whether martians exist or not. Something relatively implicit.

▲ Rochus 7 hours ago | parent | next [-]

We build pretty complex systems only based on "natural language" specifications. I think you are conflating specification ambiguity with verification accessibility.

> What does one gain besides familiarity by translation back into a more ambiguous language?

You gain intent verification. Formal languages are precise about implementation, but they are often opaque about intent. A formal specification can be "precisely wrong". E.g. you can write a perfectly precise Event-B spec that says "When the pedestrian button is pressed, the traffic light turns Green for cars"; the formalism is unambiguous, the logic is sound, the proof holds, but the intent is fatally flawed. Translating this back to natural language ("The system ensures that pressing the button turns the car light green") allows a human to instantly spot the error.

> All Martians are green

Modern LLMs are actually excellent at explicating these edge cases during back-translation if prompted correctly. If the formal spec allows vacuous truth, the back-translation agent can be instructed to explicitly flag existential assumptions. E.g. "For every Martian (assuming at least one exists), the color is Green", or "If there are no Martians, this rule is automatically satisfied". You are not translating back to casual speech; you are translating back to structured, explicit natural language that highlights exactly these kinds of edge cases.

▲ ratmice 6 hours ago | parent [-]

Maybe it can be done, but I struggle to believe adding in that branch for every forall quantifier (which may be plentiful in a proof) is going to help make a proof more understandable. Rather I feel like it'll just balloon the number of words necessary to explain the proof. Feels like it's going to fall on the bad side of verbosity as the sibling comment said.

▲ Rochus 5 hours ago | parent [-]

I think there is a misunderstanding about what is being back-translated.

We don't back-translate the proof steps (the thousands of intermediate logical derivations). That would indeed be verbose and useless.

We back-translate the specification: the Invariants, Guards, and Events.

For a traffic light system, we don't need the LLM to explain the 50 steps of predicate logic that prove inv3 holds. We just need it to translate inv3 itself:

    Formal: inv3: light_NS = Green ⇒ light_EW = Red

    Back-translation: 'Invariant: If the North-South light is Green, the East-West light MUST be Red.'

This isn't verbose; it's the exact concise summary of the system's safety rules. The 'verbosity' of handling edge cases (like the 'Allsome' example) only applies when the specification itself relies on subtle edge cases, in which case, being verbose is exactly what you want to prevent a hidden bug.

▲

ratmice 4 hours ago | parent [-]

Definitions are built up layer upon layer like an onion too, with each step adding it's own invariants reducing the problem space.

I just feel like the street light example is an extremely small free standing example. Most things that I feel are worth the effort of proving end up huge. Forever formal verification languages were denigrated for being overly rigid and too verbose. I feel like translations into natural language can only increase that if they are accurate.

One thing I wish is this whole discussion was less intertwined with AI. The semantic gap has existed before AI, and will be run into again without AI. People have been accidentally proving the wrong thing true or false forever and will never stop with our without AI help.

At the very least we can agree that the problem exists, and while i'm skeptical of natural language as being anything but the problem we ran away from. At least you're trying something and exploring the problem space and that can only be cheered.

	▲	Rochus 4 hours ago \| parent [-]
		My bet is that AI changes the economics of that verbosity, making it cheap to generate and check those 'huge' definitions layer by layer. The next four years will show.

▲ lindenr 7 hours ago | parent | prev | next [-]

I agree, if AI (or humans) have mistranslated a natural language statement to a formal statement, we should not rely on AI to correctly translate the formal statement back into natural language.

For many statements I expect it's not possible to retain the exact meaning of the formal-language sentence without the natural language becoming at least as complex, and if you don't retain meaning exactly then you're vulnerable to the kind of thing the article warns about.

	▲	guenthert 5 hours ago \| parent [-]
		> if AI (or humans) have mistranslated a natural language statement to a formal statement, we should not rely on AI to correctly translate the formal statement back into natural language. Perhaps we must not rely on it and find a way to make sure that it cannot fail, but I like to point out that this are two different problems and it seems to me that the current crop of so called AIs are pretty good at distilling excerpts. Perhaps that's the easier problem to solve?

▲ 6 hours ago | parent | prev | next [-]

[deleted]

▲ smarx007 7 hours ago | parent | prev [-]

> why do we invent these formal languages except to be more semantically precise than natural language

To be... more precise?

On a more serious note, cannot recommend enough "Exactly: How Precision Engineers Created the Modern World" by Winchester. While the book talks mostly about the precision in mechanical engineering, it made me appreciate _precision_ itself to a greater degree.

▲

ratmice 7 hours ago | parent [-]

Rhetorical sentence? My point is that back-translation into natural langauge is translating into a less precise form. How is that going to help? No number of additional abstraction layers are going to solve human confusion.

	▲	smarx007 7 hours ago \| parent [-]
		Oh well, that flew over my head. You are right.