▲ | n_u 3 days ago | |
P(two failures within MTTR for first node) = P(one failure)P(second failure within MTTR of first node|one failure) independence simplifies things = P(one failure)P(second failure within MTTR of first node) = P(one failure) * (1 - e^-λx) where x = MTTR for first node λ = 1/MTBF plugging in the numbers from your blog post P(one failure within 30 days) = 0.01 not sure if this part is correct. MTTR = 5 minutes + 5 hours =~ 5.083 hours MTBF = 30 days / 0.01 = 3000 days = 72000 hours 0.01 * (1 - e^(-5.083 / 72000)) = 0.0000007 ~= 0.00007 % I must be doing something wrong cuz I'm not getting the 0.000001% you have in the blog post. If there's some existing work on this I'd be stoked to read it, I can't quite find a source. Also there's two nodes that have the potential to fail while the first is down but that would make my answer larger not smaller. | ||
▲ | rcrowley 3 days ago | parent [-] | |
I computed P(node failure within MTTR) = 0.00007 same as you. I extrapolated this to the outage scenario P(at least two node failures within MTTR) = P(node failure within MTTR)^2 * (1-P(node failure within MTTR)) + P(node failure within MTTR)^3 = 5.09 * 10^-9 which rounds to 0.0000001%. |