| ▲ | danabramov 3 days ago | ||||||||||||||||
I actually want to push back a bit on this. I don’t have the context on what the author has done, but proof engineering is a bit different from software engineering. In particular, if you get definitions right and if your plan actually makes sense mathematically (i.e. the proof is possible using your planned breakdown), you have a much more linear sense of progress. You don’t have “last minute surprises” which are only uncovered when the program runs. You do have surprises with modeling (hence my caveats above). But there’s also a much clearer sense that if the “contours” of the proof plan are correct, filling them in is much more parallelizable and more straightforward than with programming. I think partially this is due to the language being pure (so there’s no surprising side effects, global state, or “time”) and partially due to the “if it typechecks, you’re done” nature of proofs. Also mathematics is (unsurprisingly) composable — more than run-of-the-mill software development. | |||||||||||||||||
| ▲ | wenc 3 days ago | parent | next [-] | ||||||||||||||||
> “if it typechecks, you’re done” nature of proofs. This resonates with me. Software engineering is more art than math -- you have to deal with corner cases, exceptions, human interactions. So much of code encodes culture, preferences, storytelling, etc. It's kind of a "procedural epistemology". You are codifying -- without correctness -- the behaviors that you know. Pure Math doesn't work like that -- it is more declarative and is built on a structure of correct statements. (numerical math is a different animal -- it's closer to engineering -- but proofs are pure math). It's way more limited and restrictive and rigorous than imperative code. It's like writing in Haskell. If it compiles in Haskell, the result might not be useful in real life, but there's a degree of consistency that languages like Python cannot enforce. But even Haskell for practical reasons only aims at consistency rather than correctness. Proof assistants like Lean and Coq go all the way in Curry-Howard correspondence. | |||||||||||||||||
| ▲ | yencabulator 9 hours ago | parent | prev | next [-] | ||||||||||||||||
The article contradicts this: > There’s often a push-pull between these tasks. To give a recent example, suppose I want to prove that a function is associative (task 4). I realize that this theorem is false—oops—because of the underlying type that I've chosen (task 5). Maybe this is a problem I can avoid by decomposing other theorems differently (task 3). But in this case, I realize it’s a hard requirement, and I need to replace the current type with something else—for example, a finite map—and then refactor the whole formalization accordingly (task 2). I also need to check whether my theory is mathematically correct, or if I’ve made some conceptual mistake which renders important theorems unprovable (task 1). > My most frustrating experiences with Lean have rarely involved proofs of individual theorems. More often, some previous decision has unexpectedly blocked me, and I’m facing a choice between several different complex refactors to fix it. This is what happened with the example above—in the end, I decided to rewrite everything with finite maps, which meant rewriting several definitions and proofs. | |||||||||||||||||
| ▲ | cole-k 2 days ago | parent | prev | next [-] | ||||||||||||||||
I don't really see there being much of a difference between the modeling surprises and last-minute surprises only uncovered when the program runs. If the gist of what you're saying is that an experienced proof engineer should have fewer modeling surprises than an experience software engineer has last-minute code execution surprises, then I can sort of agree. But the "contours" of a proof you're describing are basically analogous to decomposing an application into individual components (modules/functions) --- and if you do this and test your components you should expect your software to not have any "last minute surprises" except for modeling failures as well. I guess the point I mostly disagree with is the feasibility of what you describe here: > In particular, if you get definitions right and if your plan actually makes sense mathematically (i.e. the proof is possible using your planned breakdown), you have a much more linear sense of progress. This presupposes that you know how the proof should be written, which means that you not only have an understanding of the proof mathematically (whatever your definition of "mathematically" is), but you ALSO know how to warp that understanding into a machine proof. The author notes that the "getting the definitions right" part is slower than them writing out the definitions themselves for a variety of reasons which ultimately mean they had to carefully comb through the definitions to make sure there were no logic errors. So I don't really disagree with your overall point (which is that the author can probably be trusted to say they are actually at 50%), but I think you oversell the assurances granted by formal verification when compared to software engineering. | |||||||||||||||||
| |||||||||||||||||
| ▲ | ndriscoll 2 days ago | parent | prev | next [-] | ||||||||||||||||
Your comment basically perfectly describes why Scala is a wonderfully pleasant language to work with. I could review code and know that if it had the right high level steps and if it compiled, it was probably right. When writing code you figure out the high level steps and then write whatever you need to make it type check. If you don't go out of your way to do something perverse, it's probably correct. | |||||||||||||||||
| ▲ | t_mann 2 days ago | parent | prev | next [-] | ||||||||||||||||
> You don’t have “last minute surprises” Not sure about that. One of the steps in your plan may be surprisingly hard to prove. You're going to solve the easier steps in your plan first and then may get bogged down at 98%. | |||||||||||||||||
| |||||||||||||||||
| ▲ | theendisney 2 days ago | parent | prev [-] | ||||||||||||||||
To formalize: After a few days 90% works, after a few more days 90% of the remaining 10% works, after a few more days.... long story short, it never works. | |||||||||||||||||