Remix.run Logo
akoboldfrying 3 days ago

> Now you have to somehow write down what all that does.

I think the core difficulty is that there's no way to know whether your spec is complete. The only automatic feedback you can hope to get is that, if you add too many constraints, the prover can find a contradiction between them. But that's all (that I'm aware of, at least).

Let's take an extremely simple example: Proving that a sort algorithm works correctly. You think, "Aha! The spec should require that every element of the resulting list is >= the previous element!", and you're right -- but you are not yet done, because a "sorting algorithm" that merely returns an empty list also satisfies this spec.

Suppose you realise this, and think: "Aha! The output list must also be the same size as the input list!" And again, you're right, but you're still not done, because a "sorting algorithm" that simply returns inputSize copies of the number 42 also satisfies this new spec.

Suppose you notice this too, and think: "Aha! Every element in the input should also appear the same number of times in the output!" You're right -- and now, finally, your spec is actually complete. But you have no way to know that, so you will likely continue to wonder if there is some additional constraint out there that you haven't thought of yet... And this is all for one of the tidiest, most well-specified problems you could hope to encounter.

seanwilson 3 days ago | parent | next [-]

Major flaws in a specification for one function are usually quickly picked up when the proof for another function relies on the missing specification properties e.g. if you only prove the list returned by a function has a certain length and nothing about the contents of the list, you're going to quickly find out when another function needs to prove anything trivial about the contents of that list.

Not that it's flawless either, but supplementing with standard e.g. unit tests helps catch specification bugs like this and you could generate unit test examples from the specification to review for correctness as well. You would notice problems when running the program too just like with regular programming, you wouldn't just write proofs about your program and not try running it.

Nothing is perfect but just like in regular programming languages where adding a few simple regular automated tests or some more accurate static type annotations (e.g. JavaScript to TypeScript) will catch a lot of problems compared to nothing at all (and with diminishing returns), a proof of some simple properties will flush out a lot of bugs and edge cases. So I feel the common reaction of "specifications can have flaws" is overplayed when it's clearly a big step up the process of eliminating bugs.

Does an alternative exist that's foolproof?

learningstud a day ago | parent | next [-]

> Major flaws in a specification for one function are usually quickly picked up when the proof for another function relies on the missing specification properties

Great point! In a sense, it's testing by immediate use at compile time. I always imagine this to be the greatest productivity booster, even greater than AI. You'll notice things are wrong as you type.

akoboldfrying 3 days ago | parent | prev [-]

> Major flaws in a specification for one function are usually quickly picked up when the proof for another function relies on the missing specification properties

Great point!

> supplementing with standard e.g. unit tests helps catch specification bugs like this

I don't think it does -- any specific input+output pair that a unit test tests for will satisfy all the constraints of a complete spec, so it will necessarily also satisfy an incomplete spec (i.e., a subset of those constraints). You could detect overly strong spec constraints this way, but not insufficiently strong ones.

IanCal 3 days ago | parent [-]

I think you’re picturing the unit test case the opposite way around to me, but what I see relies on having something generated for you. While a unit test case will also pass the weak specs, there exist spec meeting implementations that don’t pass the test.

So either that requires “generate valid code and let’s test it” or that you can write a proof statement like:

If : there is at least one implementation which is valid for the following properties, and does not meet this single property (fixed input output pair) - specifications are under defined.

akoboldfrying 3 days ago | parent [-]

> there exist spec meeting implementations that don’t pass the test.

Ah, I see what you mean. Yes, in practice you would have an actual implementation, and if it failed the test but passed the spec you would know the spec was underspecified. I stand corrected.

> If : there is at least one implementation which is valid for the following properties, and does not meet this single property (fixed input output pair) - specifications are under defined.

Clever! But I guess we're probably some way from being able to existentially quantify over all possible implementations... It might be that the only satisfying implementation is one for which we cannot even prove termination. More optimistically, maybe it turns out that whenever there is a satisfying implementation, there must be one in some simple class (like straight-line programs) that can be checked exhaustively.

igornotarobot 3 days ago | parent | prev | next [-]

Producing positive and negative examples is exactly where model checkers shine. I always write "falsy" invariants to produce examples of the specification reaching interesting control states for at least one input. After that, I think about the system boundaries and where it should break. Then, the model checker shows that it indeed breaks there.

Having a specification does not mean that one should not touch it. It is just a different level of experimental thinking.

wakawaka28 2 days ago | parent | prev | next [-]

I think you're right, but technically too many constraints doesn't mean the spec is wrong: some may be redundant, and that can be ok or even helpful. A lack of contradictions doesn't mean it's right either. I would argue that the problem of not knowing you got all the constraints specified is the same as not knowing if all generic requirements are specified. It's more work to formally specify anything and the constraints are more difficult to casually interpret, but in either case doneness is more of a continuum than a binary attribute.

scrubs 3 days ago | parent | prev [-]

And ... so what? Nobody ever said specs are error free or complete. Heck, you didn't even get into liveness, dead lock issues.

The salient question: is risk reduced for the time alotted to write a spec in say spin or tla+?

Formal specs are risk reduction not magic.

akoboldfrying 3 days ago | parent [-]

> And ... so what?

If you don't think clearly elucidating the specific issue holding back wider adoption of an otherwise amazing technology is relevant to this discussion, I don't know what to tell you.

scrubs 2 days ago | parent | next [-]

I'm currently writing a formal model for a packet protocol in tla. (I started with spin). And im writing a paper with my findings including some thoughts on why FM are not more widely adopted. here academics and FM system devs have work to do. But let me tend to the other side: industrial developers: FM requires attention to detail in order to leverage the power of LTL etc. You mention a few things about sorting which are important details. Too many devs I think are a bit lazy here. HW does better as a discipline ... software devs no so much. FM more than vanilla coding requires attention to many small issues eg fairness. I think that too easily scares off some devs. Also writing a tla model requires a different mind set from imperative coding. Devs see that and are too allergic to learning.

cwillu 3 days ago | parent | prev [-]

Indeed, it's a microcosm of the same problem.