Remix.run Logo
swiftcoder 3 days ago

The scenarios in the article are all about mission-critical disaster recovery - we don't even trust the majority of our human colleagues with those scenarios! AI won't make inroads there without humans in the loop, until AI is 100% trustworthy.

tptacek 3 days ago | parent | next [-]

Right, so: having an agent go drop index segments from a search cluster to resolve a volume utilization problem is a bad idea, rather than just suggesting "these old index segments are using up 70% of the storage on this volume and your emergency search cluster outage would be resolved if you dropped them, here's how you'd do that".

But there are plenty of active investigative steps you'd want to take in generating hypotheses for an outage. Weakly's piece strongly suggests AI tools not take these actions, but rather suggest them to operators. This is a waste of time, and time is the currency of incident resolution.

datadrivenangel 3 days ago | parent | prev | next [-]

And the author assumes that these humans are going to be very rigorous, which is good for SRE teams, but even then not consistently.

agentultra 3 days ago | parent [-]

We don't need humans to be perfect to have reliable responses to critical situations. Systems are more important than individuals at that level. We understand people make mistakes and design systems and processes to compensate.

The problem with unattended AI in these situations is precisely the lack of context, awareness, intuition, intention, and communication skills.

If you want automation in your disaster recovery system you want something that fails reliably and immediately. Non-determinism is not part of a good plan. Maybe it will recover from the issue or maybe it will delete the production database and beg for forgiveness later isn't what you want to lean on.

Humans have deleted databases before and will again, I'm sure. And we have backups in place if that happens. And if you don't then you should fix that. But we should also fix the part of the system that allows a human to accidentally delete a database.

But an AI could do that too! No. It's not a person. It's an algorithm with lots of data that can do neat things but until we can make sure it does one particular thing deterministically there's no point in using it for critical systems. It's dangerous. You don't want a human operator coming into a fire and the AI system having already made the fire worse for you... and then having to respond to that mess on top of everything else.

lupire 3 days ago | parent [-]

What happens when you walk into a fire and you don't know what to do? Or can't do it quickly enough?

agentultra 3 days ago | parent | next [-]

Who is sending in untrained people to manage fires? Maybe that organization deserves what's coming to them.

An extreme example: nuclear reactors. You don't want untrained people walking into a fire with the expectation that they can manage the situation.

Less extreme example: financial systems. You don't want untrained people walking into a fire losing your customers' funds and expect them to manage the situation.

swiftcoder 2 days ago | parent | prev [-]

We don't throw new-hires in the deep end of the on call rotation on their first day. We make sure they learn the systems, we provide them with runbooks, assign an experienced mentor for their first on call rotation, and have a clear escalation path if they are in over their heads or need additional resources.

topaz0 3 days ago | parent | prev [-]

Or it will, and disaster will ensue.