Remix.run Logo
NitpickLawyer 2 days ago

> In the case where the software is life and death it's generally developed in ways similar to 'real' engineering

I think even that is highly romanticised by people. Take Boeing's two big blunders in recent years. The Max and Starliner both had terrible terrible software practices that were "by the book". The problem is "the book" really changed a lot, and the behemoths haven't kept up.

It used to be that "the NASA way" was the most reliable way to build software, and there are still articles and blog posts shared here about the magical programming "rules", but frankly they've been left behind by the industry. On the Starliner project Boeing was seen as the "stable, reliable, tried and true" partner, while Dragon was seen as the "risky" one. Yet Boeing doing everything by the book had 2 loss of crew problems in their first uncrewed demo flight, one relating to software timers, and their crewed demo flight was plagued by problems root-caused to a lack of integration testing. Again, everything by the book, and they failed multiple times on the happy path! The book has to be rewritten, taking into account what the software industry (tech) has "learned" in the past decades. Just think about man-hours and amounts of billions that went into tech software engineering and you can see that obviously NASA can't keep up.

wavemode 2 days ago | parent [-]

I think, rather, you're romanticizing what "real" engineering looks like.

Real engineering doesn't mean that mistakes are never made or that there are never bugs. Rather, it is that systems are tested thoroughly enough, and designed with enough failsafes and redundancy, that safety concerns are mitigated.

The problem in the Boeing case was not that the software had bugs. Lots of aviation software has bugs, it's actually very common. Rather, the problem was that they did not design the system to be safe in the event a bug occurred.

How that looks exactly tends to differ depending on the system. As a common example, many aircraft systems have other systems which monitor them and emit a warning if they detect something which doesn't make sense. Though this would've required Boeing to create technical material for pilots on how to respond to this new type of warning, which would've required training updates, which would've required recertification of their plane design, the cost of which Boeing desperately wanted to avoid. Fortunately (unfortunately), FAA oversight had become very lax, so Boeing instead just downplayed the safety concerns and nobody asked any questions.

marcosdumay 2 days ago | parent [-]

> Rather, it is that systems are tested thoroughly enough, and designed with enough failsafes and redundancy

Yeah... That's one of the main reasons why engineers from most other disciplines have a lot of difficulty creating large reliable software.

Testing systems and adding failsafes are not nearly enough for system reliability, and not the best way to add it to software. It's almost enough for mechanical engineering... Almost, because it's not enough to make human interactions safe.

wavemode 2 days ago | parent [-]

> Testing systems and adding failsafes are not nearly enough for system reliability, and not the best way to add it to software.

Then what is the best way?

marcosdumay a day ago | parent [-]

Just like with systems reliability, nobody knows it well. But some things we do know are that keeping it simple quickly becomes more impactful than adding failsafes, and you need to design quality holistically from the top down, and not focus at failures or known failure modes.

And just to say, the fact that nobody has any idea how to engineer systems reliability is the main reason why we have a duopoly in large airplanes that everybody expects to turn into a monopoly soon. If we knew how to do it, companies would pop into the market all the time.