Remix.run Logo
gf000 6 hours ago

Well, I'm someone who barely knows more than jack about formal verification, but in pretty much every case you have to have some kind of model that you are actually verifying.

How close that model sits to the real thing you have modeled is an important question, and you are free to be as close or distant as you want -- e.g. for verifying different properties of a programming language you might decide to not care about CPU instructions, registers, etc, and only care about the semantic model. This has absolutely many use cases (e.g. whether a particular optimization is sound) where this "model mismatch" doesn't matter, this doesn't make formal verification useless in any way or shape, imo.

Getting back to at the "e-commerce refund management" -- you can absolutely have a model that does e.g. a particular database IO call that either succeeds or not. With such a model in place, you can have the rest of your codebase formally verified and know that 'with a properly working database it will always work correctly' [1]. Is that not a very significant and useful finding in and of itself? Would you be more confident in your end-to-end tested software than the above?

Especially that one can then separately test that particular call site as deeply as they want, to determine that the assumed property (it either returns success or fails) is sound.

[1] Given a correct specification, which is not easy to get right

dfabulich 5 hours ago | parent [-]

Formal verification is a siren song. The siren sings, "bug-free code is possible in principle!" But it's a trap. Even with LLMs, bug-free code is impractical.

I argued that property-based testing is mostly unhelpful for e-commerce/CRUD apps, and that formal verification is a performance improvement on property-based tests.

In a property-based test, you identify some rule (an invariant) that you want to apply to your code. Then, you fuzz your app, testing it with autogenerated inputs, failing the test if the rule is broken at any point. In formal verification, you prove that the code always satisfies the rule, so you don't have to try millions of inputs.

Whether you're doing property-based testing or formal verification, it's extremely difficult to think of any non-trivial business logic properties that should apply to CRUD apps, even if they could be written in English, translated perfectly into code, and verified formally, instantly.

An actual rule that should always be followed, inflexibly, such that a mathematical proof would be useful (and that actually matters to your business) is so rare in CRUD apps that I'm not sure I've ever seen one.

Even with general-purpose rules (the app should never crash, the app should not leak memory), the property-based fuzzers tend to find bugs that have never happened in production, and probably never will. It's rarely economical for an e-commerce app developer to spend time fixing those bugs, even if finding them cost nothing at all (which is not remotely true, even with LLMs).

And what about UI? Maybe you'd want a rule like: "The title of the product for sale should never overflow its container rectangle in the UI."

OK, well, what if the title is one very long word? But… none of the products you sell happen to contain any words that are 500 characters with no spaces. I guess you could add code to prevent that product from ever being created? (And ensure that data in the database will never allow product titles that violate your business rules… how, exactly?)

Formal verification shines where property-based testing is already useful. It's already useful for many software platforms. It's useful for databases, where reliability is essential. It's useful for parsers, particularly when you expect the end user to be attempting to send you hostile code.

But e-commerce apps? CRUD apps? Not so much.

TacticalCoder 2 hours ago | parent | next [-]

> It's already useful for many software platforms. It's useful for databases, where reliability is essential. It's useful for parsers, particularly when you expect the end user to be attempting to send you hostile code.

> But e-commerce apps?

So the e-commerce app doesn't use a DB?

And the e-commerce app receives zero user input that needs to be parser so doesn't use any parser either?

If you say its useful for, for example, DB and parsers, then any app using these benefits from it.

dfabulich an hour ago | parent [-]

Read carefully. There's a big difference between developing a database and developing an app that uses a database.

If you're developing a database, you should use property-based tests to ensure that your database behaves as expected (ACID reliability, etc.). If you can formally verify parts of your database, you won't have to fuzz it, because it will have been proven correct. But if you're developing a CRUD app that uses a database, there may be no properties of your app at all that are worth fuzzing.

Similarly, if you write a library that parses Markdown, you should write property-based tests of it, and fuzz it. If you use a library that parses Markdown, you should pass it your Markdown, and let the library handle it.

E-commerce apps typically don't need to do any non-trivial parsing. Effect-free logic is likely to be ~1% of your code base, or less.

This is why we keep talking past each other: the techniques that make sense for real-time operating systems don't make sense for an e-commerce app, a line-of-business admin dashboard, or a single-player game.

win311fwg 3 hours ago | parent | prev [-]

> An actual rule that should always be followed, inflexibly, such that a mathematical proof would be useful (and that actually matters to your business) is so rare in CRUD apps that I'm not sure I've ever seen one.

I see them often. Nearly every CRUD app I have come across in the wild has, for example, employed some form of email validation. To your concern, the rules of email validation are well defined and are unlikely to ever change. Importantly, a large percentage of the implementations I saw got it wrong.

As a user, I have also tried to use CRUD apps that have rejected my valid email address, so this isn't even a class of problems that will never be realized in practice. It is a pain I have felt in the real world. The use of PBTs or proofs would have easily uncovered the implementation failures.

dfabulich 3 hours ago | parent [-]

The rules of email validation are not remotely well defined! Syntactic email validation is an impossibly hard problem. https://www.netmeister.org/blog/email.html

IMO the industry consensus is never to "validate" email addresses syntactically, but simply to ensure that the email address contains at least one @ and to verify the email address by emailing it an activation code.

Proofs would not have uncovered these failures. The proofs would have proved that they rejected your email address as invalid, and the developers would have patted themselves on the back for a job well done.

win311fwg 3 hours ago | parent [-]

> The rules of email validation are not remotely well defined!

RFC 5322 fully defines the structure.

> IMO the industry consensus is never to "validate" email addresses syntactically

That is true, but not because it isn't well defined, but because it is hard to get right. Keep in mind that most developers don't even know what a property-based test is, and of those that do, only a small subset of them know how to use them. If you find any testing around it at all, which is a stretch to begin with, you will be lucky to find more than a small set of common addresses without any care or concern for the complex edge cases that lead to problems like I have had as a user in the past. Encouraging developers to only validate for the presence of @ means that there is no additional room to screw things up.

But better than to rely on gimping your code to deal with developers is to use the tools at your disposal.

> and to verify the email address by emailing it an activation code.

That may also be beneficial, but not for the same reason. Not all CRUD use-cases fit that mould.

dfabulich 3 hours ago | parent | next [-]

Don't forget RFC 5321! But the RFCs are ignored in practice by all popular mail servers. There are email addresses that work in practice that don't comply with the RFCs, and there are email addresses that the RFCs permit that don't work in practice.

(This happens with a lot of standards; sometimes people just ignore them and do their own thing. Something similar has happened with SVGs.)

If you write a formal verification of a syntactical email validator that ensures that all/only RFC-compliant email addresses are valid, you'll have completely wasted your time. Don't do it. Just check for at least one @ sign, and email the address to test it.

(This is a perfect example of the trap of formal verification.)

win311fwg 2 hours ago | parent [-]

> Don't forget RFC 5321!

As long as you don't forget RFC 6531.

> But the RFCs are ignored in practice by all popular mail servers.

While I agree that you can make a compelling case that sending email is the C in CRUD, usually when someone is talking about CRUD they are referring to systems that satisfy all four letters. U violates the spirit of email, and R and D is usually handled independently of the MTA. So what email servers do here is irrelevant. If you go way back up the comment chain you will even see that Postgres was specifically mentioned. Postgres doesn't care what an email server does, but it does care about data consistency.

> Just check for at least one @ sign

That's a valid specification and in practice you are going to want to make that a PBT to ensure that your implementation actually adheres to the specification. You might try testing foo@bar.com, but what about foo\u0040bar.com? Will you think to test it too? Probably not. Will your code handle it correctly? You may be perfect, but when we get out into the general developer population where all kinds of crazy things show up when they start monkeying with your code, the answer is also probably not.

Looking simple isn’t a reason to not use the tools at your disposal, even if many won’t.

> and email the address to test it.

Poor general advice. That is expressly illegal in some jurisdictions.

Analemma_ 2 hours ago | parent | prev [-]

I'm speaking from painful experience here: if you assume RFC 5322 has anything whatsoever to do with how email addresses actually work in the wild, you're in for a world of hurt. Popular email providers don't give a shit what RFC 5322 says, and you can't either if you want to have any hope of actually sending and receiving mail. Test messages are the only way to validate an email address, period.

2 hours ago | parent [-]
[deleted]