Remix.run Logo
ffsm8 2 hours ago

> SREs debugging production outages to find a proximate "root" technical cause is a small fraction of the SRE function.

According to the specified goals of SRE, this is actually not just a small fraction - but something that shouldn't happen. To be clear, I'm fully aware that this will always be necessary - but whenever it happened - it's because the site reliability engineer (SRE) overlooked something.

Hence if that's considered a large part of the job.. then you're just not a SRE as Google defined that role

https://sre.google/sre-book/table-of-contents/

Very little connection to the blog post we're commenting on though - at least as far as I can tell.

At least I didn't find any focus on debugging. It put forward that the capability to produce reliable software is what will distinguish in the future, and I think this holds up and is inline with the official definition of SRE