| ▲ | ragall 2 hours ago | |||||||||||||
All 3 hyperscalers have vulnerabilities in their control planes: they're either single point of failure like AWS with us-east-1, or global meaning that a faulty release can take it down entirely; and take AZ resilience to mean that existing compute will continue to work as before, but allocation of new resources might fail in multi-AZ or multi-region ways. It means that any service designed to survive a control plane outage must statically allocate its compute resources and have enough slack that it never relies on auto scaling. True for AWS/GCP/Azure. | ||||||||||||||
| ▲ | tbrownaw 2 hours ago | parent | next [-] | |||||||||||||
> It means that any service designed to survive a control plane outage must statically allocate its compute resources and have enough slack that it never relies on auto scaling. True for AWS/GCP/Azure. That sounds oddly similar to owning hardware. | ||||||||||||||
| ||||||||||||||
| ▲ | everfrustrated an hour ago | parent | prev [-] | |||||||||||||
This outage talks about what appears to be a VM control plane failure (it mentions stop not working) across multiple regions. AWS has never had this type of outage in 20 years. Yet Azure constantly had them. This is a total failure of engineering and has nothing to do with capacity. Azure is a joke of a cloud. | ||||||||||||||
| ||||||||||||||