Thanks for the summary for those of us who can't watch video right now.

There are so many layers of failures that it makes you wonder how many other operations on those ships are only working because those fallbacks, automatic switchovers, emergency supplies, and backup systems save the day. We only see the results when all of them fail and the failure happens to result in some external problem that means we all notice.

▲

arjie 8 hours ago | parent | next [-]

It seems to just be standard "normalization of deviance" to use the language of safety engineering. You have 5 layers of fallbacks, so over time skipping any of the middle layers doesn't really have anything fail. So in time you end up with a true safety factor equal only to the last layer. Then that fails and looking back "everything had to go wrong".

As Sidney Dekker (of Understanding Human Error fame) says: Murphy's Law is wrong - everything that can go wrong will go right. The problem arises from the operators all assuming that it will keep going right.

I remember reading somewhere that part of Qantas's safety record came from the fact that at one time they had the highest number of minor issues. In some sense, you want your error detection curve to be smooth: as you get closer to catastrophe, your warnings should get more severe. On this ship, it appeared everything was A-OK till it bonked a bridge.

▲

bombcar 7 hours ago | parent [-]

This is the most pertinent thing to learn from these NTSB crash investigations - it's not what went wrong at the final disaster, but all the things that went wrong that didn't detect that they were down to one layer of defense.

Your car engaging auto brake to prevent a collision shouldn't be a "whew, glad that didn't happen" and more a "oh shit, I need to work on paying attention more."

▲

aidenn0 31 minutes ago | parent | next [-]

I had to disable the auto-brake from RCT[1] sensors because of too many false-positives (like 3 a week) in my car.

1: rear-cross-traffic i.e. when backing up and cars are coming from the side.

▲

dmurray 6 hours ago | parent | prev [-]

Why then does the NTSB point blame so much at the single wiring issue? Shouldn't they have the context to point to the 5 things that went wrong in the Swiss cheese and not pat themselves on the back with having found the almost-irrelevant detail of

> Our investigators routinely accomplish the impossible, and this investigation is no different...Finding this single wire was like hunting for a loose rivet on the Eiffel Tower.

In the software world, if I had an application that failed when a single DNS query failed, I wouldn't be pointing the blame at DNS and conducting a deep dive into why this particular query timed out. I'd be asking why a single failure was capable of taking down the app for hundreds or thousands of other users.

▲

plorg 5 hours ago | parent | next [-]

That seems like a difference between the report and the press release. I'm sure it doesn't help that the current administration likes quick, pat answers.

The YouTube animation they published notes that this also wasn't just one wire - they found many wires on the ship that were terminated and labeled in the same (incorrect) way, which points to an error at the ship builder and potentially a lack of adequate documentation or training materials from the equipment manufacturer, which is why WAGO received mention and notice.

	▲	bombcar 4 hours ago \| parent [-]
		It’s also immediately actionable and other similar ships can investigate their wires

▲

toast0 5 hours ago | parent | prev [-]

The faulty wire is the root cause. If it didn't trigger the sequence of events, all of the other things wouldn't have happened. And it's kind of a tricky thing to find, so that's an exciting find.

The flushing pump not restarting when power resumed did also cause a blackout in port the day before the incident. But you know, looking into why you always have two blackouts when you have one is something anybody could do; open the main system breaker, let the crew restore it and that flushing pump will likely fail in the same way every time... but figuring out why and how the breaker opened is neat, when it's not something obvious.

	▲	nothercastle 3 hours ago \| parent [-]
		Operators always like to just clear the fault and move on they have extremely high pressure to make schedule and low incentive to work safely

▲

crote 8 hours ago | parent | prev | next [-]

Oh, it gets even worse!

The NTSB also had some comments on the ship's equivalent of a black box. Turns out it was impossible to download the data while it was still inside the ship, the manufacturer's software was awful and the various agencies had a group chat to share 3rd party software(!), the software exported thousands of separate files, audio tracks were mixed to the point of being nearly unusable, and the black box stopped recording some metrics after power loss "because it wasn't required to" - despite the data still being available.

At least they didn't have anything negative to say about the crew: they reacted timely and adequately - they just didn't stand a chance.

	▲	nothercastle 3 hours ago \| parent \| next [-]
		It’s pretty common for black boxes to be load shed during an emergency. Kind of funny how that was allowed for a long time.
	▲	MengerSponge 2 hours ago \| parent \| prev [-]
		"they reacted timely and adequately" and yet: they're indefinitely restricted (detained isn't the right word, but you get it) to Baltimore, while the ship is free to resume service.

▲

haddonist 2 hours ago | parent | prev | next [-]

One of the things Sal Mercogliano stressed is that the crew (and possibly other crews of the same line) modified systems in order to save time.

Rather than doing the process of purging high-sulphur fuel that can't be used in USA waters, they had it set so that some of the generators were fed from USA-approved fuel, resulting in redundancy & automatic failover being compromised.

It seems probable that the wire failure would not have caused catastrophic overall loss of power if the generators had been in the normal configuration.

▲

6 hours ago | parent | prev | next [-]

[deleted]

▲

dboreham 4 hours ago | parent | prev [-]

Also the zeroth failure mode: someone built a bridge that will collapse if any of the many many large ships that sail beneath it can't steer itself with high precision.

	▲	foobar1962 3 hours ago \| parent [-]
		Ships were a lot smaller when the bridge was designed and built.