▲ | heelix 6 days ago | |
The first time I'd seen a blameless post mortem, I thought it was a load of bs, as another organization had just caused the first significant production outage our app had ever had. Convenient... no blame. We went along with the process and it did not take very long to understand how this changed the culture. If someone horked a step on a manual deploy, the real question is why is this not automated. People stopped hiding mistakes - so the old snipe hunts where information to trouble shoot might 'disappear' faded and made it easier to debug and then figure out what could be done better. It helped the business understand that 'running in production' did not mean done. Ryan, if you are out there reading this - ty. | ||
▲ | ethbr1 6 days ago | parent [-] | |
100%. It was an eye opening experience. Felt somewhat akin to running an RCA on "Why do people hide mistakes?" Well, because it's in their self interest to do so! |