Remix.run Logo
WorkerBee28474 6 hours ago

> Orion utilizes two Vehicle Management Computers, each containing two Flight Control Modules, for a total of four FCMs. But the redundancy goes even deeper: each FCM consists of a self-checking pair of processors.

Who sits down and determines that 8 is the correct number? Why not 4? Or 2? Or 16 or 32?

echoangle 6 hours ago | parent | next [-]

They probably set an acceptable total loss rate for the mission and worked backwards to determine how many replicas of each system they need to achieve that while minimizing total cost/weight.

So the answer is "some engineers sat down after talking to management".

y1n0 6 hours ago | parent [-]

This is correct.

croisillon 5 hours ago | parent | prev | next [-]

Eight shall be the number thou shalt count, and the number of the counting shall be eight. Nine shalt thou not count, neither count thou seven, excepting that thou then proceed to eight.

pdonis 3 hours ago | parent [-]

Ten is right out!

nine_k 6 hours ago | parent | prev | next [-]

Given a list of estimates of failure probabilities, finding the right mix of redundancy becomes a very tractable problem, maybe even freshman-level.

cubefox 6 hours ago | parent [-]

Getting the probabilities could be very difficult though, especially for issues that never occurred before.

notahacker 5 hours ago | parent | next [-]

The fault tolerance is mostly focused on background radiation flipping bits. We've got half a century of data on the frequency of those upsets and the extent to which they're correlated under different space conditions for that, not to mention the ability to irradiate prototypes of the flight computer with representative amounts of shielding in ground based facilities...

kqr 5 hours ago | parent | prev | next [-]

For issues that have never occurred before, probabilities are the wrong tool. The right thing to do is list all the behaviour the vehicle must never exhibit and think of ways it still might, despite all redundancies -- maybe even despite every single component working as intended.

Lots of mission failures in history were caused by unexpected interactions between fully functional components. Probabilities of failures don't help with that.

SauntSolaire 5 hours ago | parent [-]

And why you test till failure (ideally under real/similar conditions): to surface the failures that have never occurred before, and start collecting data on them.

9dev 6 hours ago | parent | prev [-]

That is what you hire an army of engineers for.

amelius 3 hours ago | parent | prev [-]

Why use an even number? If they use a voting style consensus mechanism wouldn't an odd number make more sense?

tjohns 3 hours ago | parent [-]

Once you've lost more than ~2 processors, you're probably into the realm of common mode failures and voting won't save you. At that point, it's entirely possible you're just working with random data coming out of all your processors.