| ▲ | brap 4 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Whenever I read about formal specs it always seems to me like “write the same tests just in a different way”, or worse, “write the same implementation but in a different way”. I guess doing things twice can help catch errors, but I fail to see what’s so special about formal specs if they can suffer from the exact same bugs as the tests/implementation. I guess the root of the problem is if you want to formally prove that a program does something, you have to be very specific (heh) about what that something is, at which point you are basically just writing tests/implementation all over again. I have been looking into this on and off for years now, and I keep feeling like I’m missing something, but I don’t know what it is. Can anyone enlighten me? | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | y1n0 4 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
I don’t write software, but in asic and fpga design, formal method specifications enable the use of things like SAT solvers to determine if the underlying test article meets the specification. You specify properties of the design and the tooling tests the design in a variety of ways to see if it can violate those properties. Let’s say you have a system that controls turn signals. You can specify properties that say things like lanes that cross each other can’t both have a green light at the same time or even within some period of time of each other. The tooling can exhaustively check the design for this and surface code traces that violate it. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jlebar 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
> Whenever I read about formal specs it always seems to me like “write the same tests just in a different way”, or worse, “write the same implementation but in a different way”. [...] Can anyone enlighten me? A big difference is that formal methods allow you to use the "for all" quantifier. For example, you might write a unit test that says "foo('abc') returns a string with no trailing whitespace". But with formal methods, you can prove that "for any input x, foo(x) returns a string with no trailing whitespace". This is a trivial example, but you could imagine something more complicated, such as "for any program P, compile(P) has the same behavior as P". Of course, you have to define what "has the same behavior as" means! | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | chadgpt3 16 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
For trivial functions they can be. Function: isPrime. Impl: loop to sqrt(n) and check divisibility. Spec: returns true if nothing divides. Impl: return true at the end if we didn't return false. It's not exactly the same because your spec says nothing divides, not just up to sqrt(n). It's definitely not a test if you do it right. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jaggederest 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
It's more than just "write the same tests just in a different way", because each test is largely independent, or grows to unmanageably large, and you have to do the work of figuring out the completeness of your test suite by e.g. branch coverage or other relatively crude overlap methods. Statically proven type systems let you do this in a way that each contributing piece is checked in advance against all the other pieces, guaranteeing not just "this test passes" but, in combination, "all these tests create a reasonable, coherent whole", and it disallows incoherent models from being implemented in the code. An example of this is like Rust's match, which requires complete coverage of all the possible branches, but writ large across an entire codebase. You're right that it does nothing for you if you make a fundamental logic or theory error, it can't catch that kind of error, but you'd be surprised how much more robust it is than "merely" e.g. Haskell's type system + ad hoc functional testing + property testing - which I would consider a quite strong system overall, but formally proven e.g. cryptography is a much higher bar. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | comonoid 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
The book "Program = Proof" by Samuel Mimram starts with a formula which is true for all n below n = 15 341 178 777 673 149 429 167 740 440 969 249 338 310 889 I don't think you can catch it with any test suite. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | IshKebab 22 minutes ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
That can definitely be a problem, and I have definitely written formal specs (for hardware in SVA) where I'm like "I'm just implementing this again". And actually that's a totally valid formal verification task to do - formal equivalence checking of an implementation against a model (which is just another implementation). I can feel (and sometimes is) tautological. However you usually try to not do that. If possible (and it usually is possible) your formal spec will be either: 1. Another implementation, but a much simpler one. For example you can formally verify the equivalence of a pipelined dual issue CPU, with a combinatorial single cycle model. 2. More general properties that much be true about the implemention, rather than exactly what the implementation must be. The classic example is compression: decompress(compress(x)) == x. | |||||||||||||||||||||||||||||||||||||||||||||||||||||