| ▲ | javier2 2 hours ago |
| Yeah. I only work for a small company, but you can be certain we will not update the status page if only a small portion of customers are affected, and if we are fully down, rest assured there will be no available hands to keep the status page updated |
|
| ▲ | s_dev 2 hours ago | parent | next [-] |
| >rest assured there will be no available hands to keep the status page updated That's not how status pages if implemented correctly work. The real reason status pages aren't updated is SLAs. If you agree on a contract to have 99.99% uptime your status page better reflect that or it invalidates many contracts. This is why AWS also lies about it's uptime and status page. These services rarely experience outages according their own figures but rather 'degraded performance' or some other language that talks around the issue rather than acknowledging it. It's like when buying a house you need an independent surveyor not the one offered by the developer/seller to check for problems with foundations or rotting timber. |
| |
| ▲ | redm an hour ago | parent | next [-] | | SLA’s usually just give you a small credit for the exact period of the incident, which is arymetric to the impact. We always have to negotiate for termination rights for failing to meet SLA standards but, in reality, we never exercise them. Reality is that in an incident, everyone is focused on fixing issue, not updating status pages; automated checks fail or have false positives often too. :/ | |
| ▲ | laurent123456 2 hours ago | parent | prev | next [-] | | This is weird - at this level contracts are supposed to be rock solid so why wouldn't they require accurate status reporting? That's trivial to implement, and you can even require to have it on a neutral third-party like UptimeRobot and be done with it. I'm sure there are gray areas in such contracts but something being down or not is pretty black and white. | | |
| ▲ | franga2000 an hour ago | parent | next [-] | | > something being down or not is pretty black and white This is so obviously not true that I'm not sure if you're even being serious. Is the control panel being inaccessible for one region "down"? Is their DNS "down" if the edit API doesn't work, but existing records still get resolved? Is their reverse proxy service "down" if it's still proxying fine, just not caching assets? | | |
| ▲ | laurent123456 a minute ago | parent [-] | | I understand there are nuances here, and I may be oversimplifying, but if part of the contract effectively says "You must act as a proxy for npmjs.com" yet the site has been returning 500 status codes across all regions several times within a few weeks while still reporting a shining 99.99% uptime, something doesn't quite add up. Still, I'm aware I don't know much about these agreements, and I'm assuming the people involved aren't idiots and have already considered all of this. |
| |
| ▲ | remus an hour ago | parent | prev [-] | | > I'm sure there are gray areas in such contracts but something being down or not is pretty black and white. Is it? Say you've got some big geographically distributed service doing some billions of requests per day with a background error rate of 0.0001%, what's your threshold for saying whether the service is up or down? Your error rate might go to 0.0002% because a particular customer has an issue so that customer would say it's down for them, but for all your other customers it would be working as normal. |
| |
| ▲ | lucianbr 2 hours ago | parent | prev | next [-] | | Are the contracts so easy to bypass? Who signs a contract with an SLA knowing the service provider will just lie about the availability? Is the client supposed to sue the provider any time there is an SLA breach? | | |
| ▲ | netdevphoenix an hour ago | parent | next [-] | | Anyone who doesn't have any choice financially or gnostically. Same reason why people pay Netflix despite the low quality of most of their shows and the constant termination of tv series after 1 season. Same reason why people put up with Meta not caring about moderating or harmful content. The power dynamics resemble a monopoly | | |
| ▲ | ozim an hour ago | parent [-] | | Most of services are not really critical but customers want to have 99.999% on the paper. Most of the time people will just get by and ignore even full day of downtime as minor inconvenience. Loss of revenue for the day - well you most likely will have to eat that, because going to court and having lawyers fighting over it most likely will cost you as much as just forgetting about it. If your company goes bankrupt because AWS/Cloudflare/GCP/Azure is down for a day or two - guess what - you won't have money to sue them ¯\_(ツ)_/¯ and most likely will have bunch of more pressing problems on your hand. |
| |
| ▲ | heipei an hour ago | parent | prev | next [-] | | The client is supposed to monitor availability themselves, that is how these contracts work. | |
| ▲ | immibis an hour ago | parent | prev [-] | | The company that is trying to cancel its contract early needs to prove the SLA was violated, which is very easy of the company providing the service also provides a page that says their SLA was violated. Otherwise it's much harder to prove. |
| |
| ▲ | 8cvor6j844qw_d6 2 hours ago | parent | prev | next [-] | | I imagine there will be many levels of "approvals" to get the status page actually showing down, since SLA uptime contracts is involved. | |
| ▲ | javier2 2 hours ago | parent | prev [-] | | I work for a small company. We have no written SLA agreements. |
|
|
| ▲ | lawnchair 2 hours ago | parent | prev | next [-] |
| I have to say that if an incident becomes so overwhelming that nobody can spare even a moment to communicate with customers, that points to a deeper operational problem. A status page is not something you update only when things are calm. It is part of the response itself. It is how you keep users informed and maintain trust when everything else is going wrong. If communication disappears entirely during an outage, the whole operation suffers. And if that is truly how a company handles incidents, then it is not a practice I would want to rely on. Good operations teams build processes that protect both the system and the people using it. Communication is one of those processes. |
|
| ▲ | onion2k an hour ago | parent | prev | next [-] |
| if we are fully down, rest assured there will be no available hands to keep the status page updated There is no quicker way for customers to lose trust in your service than it to be down and for them to not know that you're aware and trying to fix it as quickly as possible. One of the things Cloudflare gets right is the frequent public updates when there's a problem. You should give someone the responsibility for keeping everyone up to date during an incident. It's a good idea to give that task to someone quite junior - they're not much help during the crisis, and they learn a lot about both the tech and communication by managing it. |
|
| ▲ | 2 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | GoblinSlayer 2 hours ago | parent | prev [-] |
| You won't be able to update the status page due to failures anyway. |
| |
| ▲ | PhilippGille 32 minutes ago | parent [-] | | Why not? A good status page runs on a different cloud provider in a different region, specifically to not be affected at the same time. |
|