Remix.run Logo
brap 4 hours ago

Whenever I read about formal specs it always seems to me like “write the same tests just in a different way”, or worse, “write the same implementation but in a different way”.

I guess doing things twice can help catch errors, but I fail to see what’s so special about formal specs if they can suffer from the exact same bugs as the tests/implementation.

I guess the root of the problem is if you want to formally prove that a program does something, you have to be very specific (heh) about what that something is, at which point you are basically just writing tests/implementation all over again.

I have been looking into this on and off for years now, and I keep feeling like I’m missing something, but I don’t know what it is.

Can anyone enlighten me?

y1n0 4 hours ago | parent | next [-]

I don’t write software, but in asic and fpga design, formal method specifications enable the use of things like SAT solvers to determine if the underlying test article meets the specification.

You specify properties of the design and the tooling tests the design in a variety of ways to see if it can violate those properties.

Let’s say you have a system that controls turn signals. You can specify properties that say things like lanes that cross each other can’t both have a green light at the same time or even within some period of time of each other.

The tooling can exhaustively check the design for this and surface code traces that violate it.

jlebar 4 hours ago | parent | prev | next [-]

> Whenever I read about formal specs it always seems to me like “write the same tests just in a different way”, or worse, “write the same implementation but in a different way”. [...] Can anyone enlighten me?

A big difference is that formal methods allow you to use the "for all" quantifier.

For example, you might write a unit test that says "foo('abc') returns a string with no trailing whitespace".

But with formal methods, you can prove that "for any input x, foo(x) returns a string with no trailing whitespace".

This is a trivial example, but you could imagine something more complicated, such as "for any program P, compile(P) has the same behavior as P".

Of course, you have to define what "has the same behavior as" means!

brap an hour ago | parent [-]

>Of course, you have to define what "has the same behavior as" means

And that's really my issue, for example when you define "has no trailing whitespace", you are basically writing a piece of the implementation. Cover all behaviors, and you have basically re-implemented the function, no?

In other words, if I have the full formal spec of f(), isn't that the same thing as having f()?

bluGill 36 minutes ago | parent | next [-]

> In other words, if I have the full formal spec of f(), isn't that the same thing as having f()?

In some cases, however quite often, the spec is much simpler. For instance, it's easy to say that after running sort on some list, that the result is sorted. However, it is very hard to come up with the algorithm to do that from the specification. Sometimes that is even a point. Bubble sort, quick sort, tim sort, we can go on and on. There's a huge number of different sorts that computer science have discovered over the years. They all should have the same result and so you should be able to prove they do the same thing. However, in the real world there are often reasons you would prefer one to another despite all meeting the same spec.

brap 17 minutes ago | parent | next [-]

This makes a lot of sense, thank you!

pydry 21 minutes ago | parent | prev [-]

I did some of this stuff in college and what bugged me was that the spec actually ended up more complex than the code and it had bugs.

That was a long time ago and they said that formal methods were the future back then, too.

chadgpt3 13 minutes ago | parent [-]

It's possible for that to happen but probably means either the function is too trivial or you're missing some abstraction in the spec

chadgpt3 14 minutes ago | parent | prev | next [-]

Non trailing whitespace means the string doesn't end with a space. But foo is a function that converts an AST to a string, that's totally different. Or it's a function that loops until \0 and changes all spaces to +

The spec should be a summary of what the impl is supposed to do. You'd want more than just doesn't end with whitespace of course.

36 minutes ago | parent | prev [-]
[deleted]
chadgpt3 16 minutes ago | parent | prev | next [-]

For trivial functions they can be. Function: isPrime. Impl: loop to sqrt(n) and check divisibility. Spec: returns true if nothing divides. Impl: return true at the end if we didn't return false.

It's not exactly the same because your spec says nothing divides, not just up to sqrt(n). It's definitely not a test if you do it right.

jaggederest 4 hours ago | parent | prev | next [-]

It's more than just "write the same tests just in a different way", because each test is largely independent, or grows to unmanageably large, and you have to do the work of figuring out the completeness of your test suite by e.g. branch coverage or other relatively crude overlap methods.

Statically proven type systems let you do this in a way that each contributing piece is checked in advance against all the other pieces, guaranteeing not just "this test passes" but, in combination, "all these tests create a reasonable, coherent whole", and it disallows incoherent models from being implemented in the code. An example of this is like Rust's match, which requires complete coverage of all the possible branches, but writ large across an entire codebase.

You're right that it does nothing for you if you make a fundamental logic or theory error, it can't catch that kind of error, but you'd be surprised how much more robust it is than "merely" e.g. Haskell's type system + ad hoc functional testing + property testing - which I would consider a quite strong system overall, but formally proven e.g. cryptography is a much higher bar.

comonoid 2 hours ago | parent | prev | next [-]

The book "Program = Proof" by Samuel Mimram starts with a formula which is true for all n below

n = 15 341 178 777 673 149 429 167 740 440 969 249 338 310 889

I don't think you can catch it with any test suite.

pdhborges an hour ago | parent [-]

If you have an int32 or less you can!

IshKebab 22 minutes ago | parent | prev [-]

That can definitely be a problem, and I have definitely written formal specs (for hardware in SVA) where I'm like "I'm just implementing this again". And actually that's a totally valid formal verification task to do - formal equivalence checking of an implementation against a model (which is just another implementation). I can feel (and sometimes is) tautological.

However you usually try to not do that. If possible (and it usually is possible) your formal spec will be either:

1. Another implementation, but a much simpler one. For example you can formally verify the equivalence of a pipelined dual issue CPU, with a combinatorial single cycle model.

2. More general properties that much be true about the implemention, rather than exactly what the implementation must be. The classic example is compression: decompress(compress(x)) == x.