▲ | charcircuit 15 hours ago | ||||||||||||||||||||||||||||||||||
Hardware verification doesn't prevent hardware failures. There is a reason RAM comes with ECC. It's not because RAM designers are too lazy to do formal verification. Even with ECC RAM, bit flips can still happen if multiple bits flip at the same time. There are also things like CPUs taking the wrong branch that occasionally happen. You can't assume that the hardware will work perfectly in the real world and have to design for failure. | |||||||||||||||||||||||||||||||||||
▲ | codebje 12 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
Well of course hardware fails, and of course verification doesn't make things work perfectly. Verification says the given design meets the specification, assumptions and all. When the assumptions don't hold, the design shouldn't be expected to work correctly, either. When the assumptions do hold, formal verification says the design will work correctly (plus or minus errors in tools and materials). We know dynamic RAM is susceptible to bit-flip errors. We can quantify the likelihood of it pretty well under various conditions. We can design a specification to detect and correct single bit errors. We can design hardware to meet that specification. We can formally verify it. That's how we get ECC RAM. CPUs are almost never formally verified, at least not in full. Reliability engineering around systems too complex to verify, too expensive to engineer to never fail, or that might operate outside of the safe assumptions of their verified specifications, usually means something like redundancy and majority-rules designs. That doesn't mean verification plays no part. How do you know your majority-rules design works in the face of hardware errors? Specify it, verify it. | |||||||||||||||||||||||||||||||||||
▲ | jojomodding 15 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
Designing around hardware failure in software seems cumbersome to insane. If the CPU can randomly execute arbitrary code because it jumps to wherever, no guarantees apply. What you actually do here is consider the probability of a cosmic ray flip, and then accept a certain failure probability. For things like train signals, it's one failure in a billion hours. | |||||||||||||||||||||||||||||||||||
|