Remix.run Logo
dfabulich 7 hours ago

Formal verification is still too limited to be useful for most app developers. The article gives an example of an e-commerce platform using it to prove the correctness of managing refunds, but then acknowledges:

> As of today, the formally verified core can handle most effect-free logic—invariants, transitions, conflict resolution. But the UI, network calls, and database interactions typically sit outside the verification boundary. Verification makes the core airtight but doesn’t guarantee end-to-end correctness.

So you can formally prove that your e-commerce refund management logic is correct, except for proving that you processed the refund. You can't even prove anything about recording the refund in your database, say nothing of proving anything about your interactions with your payment processor.

If your app is mostly tricky logic with just a bit of I/O, your app is very unusual, and it's almost certainly not an e-commerce app. E-commerce apps are mostly CRUD apps; I/O with the database, the UI, and third-party APIs (e.g. payment processors) is 99% of the code.

Even property-based testing is mostly unhelpful for e-commerce apps like these.

Instead, think of formal verification as a runtime performance improvement of property-based testing. If property-based testing is useful for your app (it probably isn't), then you may be able to convert some of your property-based tests into formal verifications.

But, honestly, you probably can't do it, not even with a high budget of tokens.

I'd love to be proven wrong, but the way to do it would be to formally prove the correctness of non-trivial open-source code with property tests. Perhaps you could formally verify significant chunks of Postgres! (But I doubt it.)

teach 6 hours ago | parent | next [-]

So much this.

I actually did take a formal verification course in college. Our final project was to use the techniques we'd been learning to verify some classic critical-section locking algorithm. I chose to verify an implementation of Lamport's bakery algorithm[0] in C (this was the 90s -- a lot of code was still being written in C).

The problem is that Lamport's algorithm makes an assumption that the "ticket number" is unbounded and any finite implementation in C will almost certainly use a value which is limited to 32 bits or so.

So I was able to formally verify that the algorithm fails to protect the critical section if enough processes are kept waiting to overflow the counter. :)

This probably just means that Lamport's algorithm isn't a great choice for such environments, but I'm still bummed that the professor gave me a B.

[0] https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm

agentultra 2 hours ago | parent | prev | next [-]

If you are willing to relax the restrictions, and you probably should, model checking is probably worth its weight in gold for these scenarios.

You won’t get proofs but you will spell out your logic in a formal language[0] and each run of the checker will exhaustively check your invariants[1].

[0] Useful because often you will learn something you hadn’t considered.

[1] A proof will guarantee your statements hold over quantifiers that are much too large for a model checker to check exhaustively. But, you can say that for a model of size N, property Y is guaranteed to hold. The “small model theorem,” posits that if there is an error in your specification, it is more likely to show up in a small model. You sacrifice the completeness of proofs but this trade-off has been worth it to me.

gf000 6 hours ago | parent | prev | next [-]

Well, I'm someone who barely knows more than jack about formal verification, but in pretty much every case you have to have some kind of model that you are actually verifying.

How close that model sits to the real thing you have modeled is an important question, and you are free to be as close or distant as you want -- e.g. for verifying different properties of a programming language you might decide to not care about CPU instructions, registers, etc, and only care about the semantic model. This has absolutely many use cases (e.g. whether a particular optimization is sound) where this "model mismatch" doesn't matter, this doesn't make formal verification useless in any way or shape, imo.

Getting back to at the "e-commerce refund management" -- you can absolutely have a model that does e.g. a particular database IO call that either succeeds or not. With such a model in place, you can have the rest of your codebase formally verified and know that 'with a properly working database it will always work correctly' [1]. Is that not a very significant and useful finding in and of itself? Would you be more confident in your end-to-end tested software than the above?

Especially that one can then separately test that particular call site as deeply as they want, to determine that the assumed property (it either returns success or fails) is sound.

[1] Given a correct specification, which is not easy to get right

dfabulich 5 hours ago | parent [-]

Formal verification is a siren song. The siren sings, "bug-free code is possible in principle!" But it's a trap. Even with LLMs, bug-free code is impractical.

I argued that property-based testing is mostly unhelpful for e-commerce/CRUD apps, and that formal verification is a performance improvement on property-based tests.

In a property-based test, you identify some rule (an invariant) that you want to apply to your code. Then, you fuzz your app, testing it with autogenerated inputs, failing the test if the rule is broken at any point. In formal verification, you prove that the code always satisfies the rule, so you don't have to try millions of inputs.

Whether you're doing property-based testing or formal verification, it's extremely difficult to think of any non-trivial business logic properties that should apply to CRUD apps, even if they could be written in English, translated perfectly into code, and verified formally, instantly.

An actual rule that should always be followed, inflexibly, such that a mathematical proof would be useful (and that actually matters to your business) is so rare in CRUD apps that I'm not sure I've ever seen one.

Even with general-purpose rules (the app should never crash, the app should not leak memory), the property-based fuzzers tend to find bugs that have never happened in production, and probably never will. It's rarely economical for an e-commerce app developer to spend time fixing those bugs, even if finding them cost nothing at all (which is not remotely true, even with LLMs).

And what about UI? Maybe you'd want a rule like: "The title of the product for sale should never overflow its container rectangle in the UI."

OK, well, what if the title is one very long word? But… none of the products you sell happen to contain any words that are 500 characters with no spaces. I guess you could add code to prevent that product from ever being created? (And ensure that data in the database will never allow product titles that violate your business rules… how, exactly?)

Formal verification shines where property-based testing is already useful. It's already useful for many software platforms. It's useful for databases, where reliability is essential. It's useful for parsers, particularly when you expect the end user to be attempting to send you hostile code.

But e-commerce apps? CRUD apps? Not so much.

TacticalCoder 2 hours ago | parent | next [-]

> It's already useful for many software platforms. It's useful for databases, where reliability is essential. It's useful for parsers, particularly when you expect the end user to be attempting to send you hostile code.

> But e-commerce apps?

So the e-commerce app doesn't use a DB?

And the e-commerce app receives zero user input that needs to be parser so doesn't use any parser either?

If you say its useful for, for example, DB and parsers, then any app using these benefits from it.

dfabulich an hour ago | parent [-]

Read carefully. There's a big difference between developing a database and developing an app that uses a database.

If you're developing a database, you should use property-based tests to ensure that your database behaves as expected (ACID reliability, etc.). If you can formally verify parts of your database, you won't have to fuzz it, because it will have been proven correct. But if you're developing a CRUD app that uses a database, there may be no properties of your app at all that are worth fuzzing.

Similarly, if you write a library that parses Markdown, you should write property-based tests of it, and fuzz it. If you use a library that parses Markdown, you should pass it your Markdown, and let the library handle it.

E-commerce apps typically don't need to do any non-trivial parsing. Effect-free logic is likely to be ~1% of your code base, or less.

This is why we keep talking past each other: the techniques that make sense for real-time operating systems don't make sense for an e-commerce app, a line-of-business admin dashboard, or a single-player game.

win311fwg 3 hours ago | parent | prev [-]

> An actual rule that should always be followed, inflexibly, such that a mathematical proof would be useful (and that actually matters to your business) is so rare in CRUD apps that I'm not sure I've ever seen one.

I see them often. Nearly every CRUD app I have come across in the wild has, for example, employed some form of email validation. To your concern, the rules of email validation are well defined and are unlikely to ever change. Importantly, a large percentage of the implementations I saw got it wrong.

As a user, I have also tried to use CRUD apps that have rejected my valid email address, so this isn't even a class of problems that will never be realized in practice. It is a pain I have felt in the real world. The use of PBTs or proofs would have easily uncovered the implementation failures.

dfabulich 3 hours ago | parent [-]

The rules of email validation are not remotely well defined! Syntactic email validation is an impossibly hard problem. https://www.netmeister.org/blog/email.html

IMO the industry consensus is never to "validate" email addresses syntactically, but simply to ensure that the email address contains at least one @ and to verify the email address by emailing it an activation code.

Proofs would not have uncovered these failures. The proofs would have proved that they rejected your email address as invalid, and the developers would have patted themselves on the back for a job well done.

win311fwg 3 hours ago | parent [-]

> The rules of email validation are not remotely well defined!

RFC 5322 fully defines the structure.

> IMO the industry consensus is never to "validate" email addresses syntactically

That is true, but not because it isn't well defined, but because it is hard to get right. Keep in mind that most developers don't even know what a property-based test is, and of those that do, only a small subset of them know how to use them. If you find any testing around it at all, which is a stretch to begin with, you will be lucky to find more than a small set of common addresses without any care or concern for the complex edge cases that lead to problems like I have had as a user in the past. Encouraging developers to only validate for the presence of @ means that there is no additional room to screw things up.

But better than to rely on gimping your code to deal with developers is to use the tools at your disposal.

> and to verify the email address by emailing it an activation code.

That may also be beneficial, but not for the same reason. Not all CRUD use-cases fit that mould.

dfabulich 3 hours ago | parent | next [-]

Don't forget RFC 5321! But the RFCs are ignored in practice by all popular mail servers. There are email addresses that work in practice that don't comply with the RFCs, and there are email addresses that the RFCs permit that don't work in practice.

(This happens with a lot of standards; sometimes people just ignore them and do their own thing. Something similar has happened with SVGs.)

If you write a formal verification of a syntactical email validator that ensures that all/only RFC-compliant email addresses are valid, you'll have completely wasted your time. Don't do it. Just check for at least one @ sign, and email the address to test it.

(This is a perfect example of the trap of formal verification.)

win311fwg 2 hours ago | parent [-]

> Don't forget RFC 5321!

As long as you don't forget RFC 6531.

> But the RFCs are ignored in practice by all popular mail servers.

While I agree that you can make a compelling case that sending email is the C in CRUD, usually when someone is talking about CRUD they are referring to systems that satisfy all four letters. U violates the spirit of email, and R and D is usually handled independently of the MTA. So what email servers do here is irrelevant. If you go way back up the comment chain you will even see that Postgres was specifically mentioned. Postgres doesn't care what an email server does, but it does care about data consistency.

> Just check for at least one @ sign

That's a valid specification and in practice you are going to want to make that a PBT to ensure that your implementation actually adheres to the specification. You might try testing foo@bar.com, but what about foo\u0040bar.com? Will you think to test it too? Probably not. Will your code handle it correctly? You may be perfect, but when we get out into the general developer population where all kinds of crazy things show up when they start monkeying with your code, the answer is also probably not.

Looking simple isn’t a reason to not use the tools at your disposal, even if many won’t.

> and email the address to test it.

Poor general advice. That is expressly illegal in some jurisdictions.

Analemma_ 2 hours ago | parent | prev [-]

I'm speaking from painful experience here: if you assume RFC 5322 has anything whatsoever to do with how email addresses actually work in the wild, you're in for a world of hurt. Popular email providers don't give a shit what RFC 5322 says, and you can't either if you want to have any hope of actually sending and receiving mail. Test messages are the only way to validate an email address, period.

2 hours ago | parent [-]
[deleted]
toast0 6 hours ago | parent | prev | next [-]

The first part of formal verification is getting a formal specification. I don't know about most developers, but I rarely get a written specification for anything I work on, and when I do, it's no where near what would be needed to turn it into a formal specification.

Anyway, the specification is subject to change at the whim of a hat, so putting a lot of effort into verifying it is foolish.

I do see value in formal verification of IPC/threading communication primitives (locks, semaphores, queues, whatevs), but then formal verification usually require assumptions for hardware behavior and those aren't always correct, so. But I've never used formal methods outside exposure in an undergrad survey class, so I dunno.

derdi 4 hours ago | parent | next [-]

Nitpick: You don't necessarily need any specification at all in order to reap benefits. Formal verification languages come with a lot of conditions that your program must fulfill in order to be accepted: Every loop terminates, every object you want to read/write is non-null, every list or array access is in bounds, etc.

For example, if you load an arbitrary C program into Frama-C, you'll have tons of properties to prove before you can even think of adding your own specification. The promises you get is that the program will always terminate and will never invoke undefined behavior. These properties are extremely useful, and these requirements are unlikely to change! And yes, C is... special. But pretty much any language has lists or arrays, nullable/option types, and unbounded loops. "No crashes/panics/uncaught exceptions" should be worth it.

And then, even absent a specification, you can start modeling invariants of the system anyway. This list is always sorted, this number is always strictly positive, this data structure never contains duplicate entries, etc. Anything that today would be a comment saying: "Important! The caller must ensure that..." This can help you gain confidence in your system. And if some day a requirement comes along that really requires you to violate one of these invariants, well, you can just remove it, the same way you would remove a test that no longer reflects something that should hold.

dfabulich 3 hours ago | parent | next [-]

> "No crashes/panics/uncaught exceptions" should be worth it.

Surprisingly, no. Property-based testing and formal validation make it easy to spend tons of time and money "preventing" bugs that would never have occurred in production, especially uncaught exceptions.

There is code where strong guarantees can be worth it, (databases, platforms/operating systems, parsers accepting hostile input) but it's not most application-level code, and certainly not most e-commerce CRUD apps.

Remember, we're here to make users' lives better, not to write correct code for its own sake.

toast0 2 hours ago | parent | prev [-]

> The promises you get is that the program will always terminate

I don't want my programs to terminate though.

aezart 2 hours ago | parent | prev | next [-]

What we get typically is a second or third-hand summary of an analysis someone else did about how the current version works, with no indication of what stuff has to stay as-is in our replacement and what stuff can be improved.

AnimalMuppet 4 hours ago | parent | prev [-]

At my last job, I got a formal specification for one project. But the implementation happened in parallel for time reasons, so by the time the spec was frozen, 90% of the code was written.

nilkn 4 hours ago | parent | prev | next [-]

There's a lot of really important software out there where being able to easily verify effect-free core logic would certainly be very useful. An e-commerce web app is not a good example. Anything safety-critical -- aerospace, defense, medical devices, power generation, industrial machines -- already requires a certification process. Auto-generating proof evidence as part of the cert process (which generally requires a rigorous spec anyway) in the near future seems like a no brainer.

inaseer 4 hours ago | parent | prev | next [-]

Have you looked at model-based testing? One way to think of it as property-based testing for stateful system, though that's underselling it a little. It's surprisingly easy to come up models/specs for most stateful systems, including CRUD apps.

Source: I've modeled a number of CRUD like and non-CRUD like systems through the Accordant framework (https://github.com/microsoft/accordant)

harperlee 6 hours ago | parent | prev | next [-]

I don't know a lot about formal verification, but:

> So you can formally prove that your e-commerce refund management logic is correct, except for proving that you /processed the refund/. You can't even prove anything about recording the refund in your database, say nothing of proving anything about your interactions with your payment processor.

You could say the same thing about the viability of functional programming on a CRUD webapp, but languages like clojure have been used to great effect here. The fact that thera are important, even fundamental, bits that you cannot verify, doesn't take out value from the fact that you can eliminate whole dimensions of issues.

tikhonj 5 hours ago | parent | prev | next [-]

Property based testing is useful for finding bugs even in these kinds of CRUD heavy apps. There can be a surprising number of bugs and unexpected behaviors in the interaction of multiple sub-systems or APIs, and one way to find those bugs is to come up with properties on traces of calls.

For example, I saw a paper on using metamorphic testing (a particular technique for defining properties to test) to find bugs in the web APIs of Spotify and YouTube[1]. I don't have time to reread the paper just now, but I remember that they found weird behavior in pagination of search results. Not a big deal in that particular case, but I've definitely seen internal APIs with behavior that could be similarly wrong but with a much larger real-world impact.

[1]: https://ieeexplore.ieee.org/document/8074764

Personally, I see property-based testing and formal specification more as tools for design and debugging more than full-on correctness. Even with AI models it's still really hard to fully prove something correct, but having even a partial logical specification can let you trade design time for debugging time and lets you find inconsistencies or potential edge-cases when you're initially figuring out a system, rather than waiting until you have a massive codebase to debug and redesign in production.

It's not a panacea and you still have to be careful at the boundary between your nicely modeled system and the real world, but, once I got the hang of working in that style, having some formal properties or partial logical specifications of the behavior I needed has been really nice to have, as much for saving effort as for ensuring correctness.

I've mostly worked in slightly different domains, but I've found property-based testing useful both as a tool to catch bugs but also as a tool for communication. I spent a couple of years building and supporting a supply chain simulation at Target, where I frequently got requests for new metrics from the supply chain planning team. By teaching them how to specify either the whole metric or, at least, some of the expected behaviors of the metric as mathematical properties, it took far fewer back-and-forth conversations to correctly implement the metric in the simulation. We could then test these things by checking these properties over a bunch of random simulation traces. Day-to-day this saved a bunch of time in debugging small simulation mistakes. In the longer-term, having this test suite also let us rewrite the simulation code in a new style in Rust to drastically increase performance. All of this would have been possible without the set of properties, it would have just involved a whole lot more slow, tedious work.

calf 2 hours ago | parent | prev | next [-]

I don't understand your argument, why precisely is contemporary app development thought to be so simple that it is not even worth thorough property testing, let alone FV? Isn't that attitude (on the part of the industry as a whole) a type of self-fulfilling prophecy?

win311fwg 4 hours ago | parent | prev | next [-]

> Formal verification is still too limited to be useful for most app developers

Limited formal verification is useful to most app developers. Consider a simplistic system that requires you to specify variables as strings or integers. You've almost certainly used a language that can express that before. Being able to formally verify that you haven't tried to stuff a string into an integer in order to give your editor the squiggles is a productivity enhancer. Or used to be in the pre-vibe coding days, at least; who knows anymore.

Your tests will eventually tell you the same thing, that is true, but the industry has decided that there is little downside to having at least some degree of formal verification and a whole lot of upsides. The question is always: where do the returns diminish? This is where you get the silly commentary on HN with the Rust guys crying out that Go isn't expressive enough and the Rocq guys laughing at it all.

microgpt 6 hours ago | parent | prev | next [-]

If you have a set of axioms that Postgres works as designed, you can prove that your code updates the database. If you define "the refund was processed" to mean "the refunded column of the order is true" you can prove that.

gdogg24 6 hours ago | parent | prev [-]

[dead]