new | show | ask | jobs Github

keybored 8 months ago

Why is verification excessive but not tests?

A verification of a property is stronger than a mere test of a property.

▲ worldsavior 8 months ago | parent [-]

The test is supposed to be the verification.

▲ GolDDranks 8 months ago | parent | next [-]

Test verify that the code works on specific inputs. Formal verification checks that it works on every input.

▲

worldsavior 8 months ago | parent [-]

Can't you make the tests check every input? (This is also what they're supposed to do.)

▲

imtringued 8 months ago | parent | next [-]

I don't know why you are trolling us. Tests aren't supposed to check every input. The entire point of classic testing is to define static execution traces by hand. That is obviously meant to only cover a finite number of execution traces. Even 100% test coverage doesn't give you proof over all possible execution traces. Tests prove that the software "works", but they do not prove the absence of bugs (which inherently requires you to specify what a "bug" is, hence the need for a specification).

Not only is static verification more powerful, there is also a massive usability difference. You define your pre and post conditions in each function (also known as specification or design contract) and the tool will automatically check that you do not violate these conditions. It solves a very different problem from classic unit or integration tests.

▲

sangel 8 months ago | parent | prev [-]

Obviously not. Suppose the input to my function is a 64-bit integer. My test cannot possibly try every possible 64-bit integer. It would take years for such a test to finish.

This is why tools like formal verification and symbolic analyses can help you establish that for all possible integers X, your function does the right thing (for some definition of “right”). You get this assurance without having to actually enumerate all X.

	▲	stephencanon 8 months ago \| parent [-]
		Indeed. Exhaustively testing a 64b input space requires about 600 machine years for each nanosecond that the function under test takes to run. So, _very_ short-running critical functions are testable on a cluster, but not efficiently testable, and if a function is simple enough to be testable, then it's usually easier to prove that it's correct instead.

▲ nextaccountic 8 months ago | parent | prev | next [-]

But why? Tests can't catch everything. A single verified predicate is equivalent to a very large, potentially infinite number of tests.

Right now the Rust stdlib is being verified using Kani, a model checker, https://model-checking.github.io/verify-rust-std/

In Kani, a proof looks like this

https://github.com/model-checking/verify-rust-std/blob/00169...

    #[kani::proof_for_contract(NonNull::new_unchecked)]
    pub fn non_null_check_new_unchecked() {
        let raw_ptr = kani::any::<usize>() as *mut i32;
        unsafe {
            let _ = NonNull::new_unchecked(raw_ptr);
        }
    }

It looks like a test, but actually it is testing that every possible usize, when converted to a pointer to i32 and built with NonNull::new_unchecked, will follow the contract of NonNull::new_unchecked, which is defined here

https://github.com/model-checking/verify-rust-std/blob/00169...

    #[requires(!ptr.is_null())]
    #[ensures(|result| result.as_ptr() == ptr)]

Which means: if the caller guarantees that the parameter ptr is not null, then result.as_ptr() is the same as the passed ptr

That's a kind of trivial contract but Kani tests for all possible pointers (rather than some cherry picked pointers like the null pointer and something else), without actually brute-forcing them but instead recognizing when many inputs test the same thing (while still catching a bug if the code changes to handle some input differently). And this approach scales for non-trivial properties too, a lot of things in the stdlib have non-trivial invariants.

You can check out other proofs here https://github.com/search?q=repo%3Amodel-checking%2Fverify-r...

It's not that different from writing a regular test, it's just more powerful. And you can even use this #[requires] and #[ensures] syntax to test properties in regular tests if you use the https://crates.io/crates/contracts crate.

Really if you have ever used the https://proptest-rs.github.io/proptest/intro.html or the https://crates.io/crates/quickcheck crate, software verification is like writing a property test, but rather than testing N examples generated at random, it tests all possible examples at once. And it works when the space of possible examples is infinite or prohibitively large, too.

▲ keybored 8 months ago | parent | prev [-]

Say I have as input a byte. I create a test that exercises every possible byte value.

A verification would be the equivalent of that. In practice that matters since the input space is often much larger than just one byte.