It seems to just be standard "normalization of deviance" to use the language of safety engineering. You have 5 layers of fallbacks, so over time skipping any of the middle layers doesn't really have anything fail. So in time you end up with a true safety factor equal only to the last layer. Then that fails and looking back "everything had to go wrong".

As Sidney Dekker (of Understanding Human Error fame) says: Murphy's Law is wrong - everything that can go wrong will go right. The problem arises from the operators all assuming that it will keep going right.

I remember reading somewhere that part of Qantas's safety record came from the fact that at one time they had the highest number of minor issues. In some sense, you want your error detection curve to be smooth: as you get closer to catastrophe, your warnings should get more severe. On this ship, it appeared everything was A-OK till it bonked a bridge.

▲

bombcar 7 hours ago | parent [-]

This is the most pertinent thing to learn from these NTSB crash investigations - it's not what went wrong at the final disaster, but all the things that went wrong that didn't detect that they were down to one layer of defense.

Your car engaging auto brake to prevent a collision shouldn't be a "whew, glad that didn't happen" and more a "oh shit, I need to work on paying attention more."

▲

aidenn0 28 minutes ago | parent | next [-]

I had to disable the auto-brake from RCT[1] sensors because of too many false-positives (like 3 a week) in my car.

1: rear-cross-traffic i.e. when backing up and cars are coming from the side.

▲

dmurray 5 hours ago | parent | prev [-]

Why then does the NTSB point blame so much at the single wiring issue? Shouldn't they have the context to point to the 5 things that went wrong in the Swiss cheese and not pat themselves on the back with having found the almost-irrelevant detail of

> Our investigators routinely accomplish the impossible, and this investigation is no different...Finding this single wire was like hunting for a loose rivet on the Eiffel Tower.

In the software world, if I had an application that failed when a single DNS query failed, I wouldn't be pointing the blame at DNS and conducting a deep dive into why this particular query timed out. I'd be asking why a single failure was capable of taking down the app for hundreds or thousands of other users.

▲

plorg 5 hours ago | parent | next [-]

That seems like a difference between the report and the press release. I'm sure it doesn't help that the current administration likes quick, pat answers.

The YouTube animation they published notes that this also wasn't just one wire - they found many wires on the ship that were terminated and labeled in the same (incorrect) way, which points to an error at the ship builder and potentially a lack of adequate documentation or training materials from the equipment manufacturer, which is why WAGO received mention and notice.

	▲	bombcar 4 hours ago \| parent [-]
		It’s also immediately actionable and other similar ships can investigate their wires

▲

toast0 5 hours ago | parent | prev [-]

The faulty wire is the root cause. If it didn't trigger the sequence of events, all of the other things wouldn't have happened. And it's kind of a tricky thing to find, so that's an exciting find.

The flushing pump not restarting when power resumed did also cause a blackout in port the day before the incident. But you know, looking into why you always have two blackouts when you have one is something anybody could do; open the main system breaker, let the crew restore it and that flushing pump will likely fail in the same way every time... but figuring out why and how the breaker opened is neat, when it's not something obvious.

	▲	nothercastle 3 hours ago \| parent [-]
		Operators always like to just clear the fault and move on they have extremely high pressure to make schedule and low incentive to work safely