It would be a good thing, if it would cause anything to change. It obviously won't. As if a single person reading this post wasn't aware that the Internet is centralized, and couldn't name specifically a few sources of centralization (Cloudflare, AWS, Gmail, Github). As if it's the first time this happens. As if after the last time AWS failed (or the one before that, or one before…) anybody stopped using AWS. As if anybody could viably stop using them.

▲

ectospheno 2 hours ago | parent | next [-]

I’m pretty cloudflare centric. I didn’t start that way. I had services spread out for redundancy. It was a huge pain. Then bots got even more aggressive than usual. I asked why I kept doing this to myself and finally decided my time was worth recapturing.

Did everything become inaccessible the last outage? Yep. Weighed against the time it saves me throughout the year I call it a wash. No plans to move.

▲

GuB-42 3 hours ago | parent | prev | next [-]

Same idea with the Crowdstrike bug, it seems like it didn't have much of on effect on their customers, certainly not with my company at least, and the stock quickly recovered, in fact doing very well. For me, it looks like nothing changed, no lessons learned.

▲

beanjuiceII 2 hours ago | parent [-]

what do you mean no lesson learned? seems like you haven't been paying attention..there's always a lesson learned

▲

peaseagee an hour ago | parent [-]

I believe they mean that Crowdstrike learned that they could screw up on this level and keep their customers....

	▲	thewebguyd 33 minutes ago \| parent [-]
		That's true of a lot of "Enterprise" software. Microsoft enjoys success from abusing their enterprise customers what seems like daily at this point. For bigger firms, the reality is that it would probably cost more to switch EDR vendors than the outage itself cost them, and up to that point, CrowdStrike was the industry standard and enjoyed a really good track records and reputation. Depending on the business, there are long term contracts and early termination fees, there's the need to run your new solution along side the old during migration, there's probably years of telemetry and incident data that you need to keep on the old platform, so even if you switch, you're still paying for CrowdStrike for the retention period. It was one (major) issue over 10+ years. Just like with CloudFlare, the switching costs are higher than outage cost, unless there was a major outage of that scale multiple times per year.

▲

captainkrtek 13 hours ago | parent | prev | next [-]

> It would be a good thing, if it would cause anything to change. It obviously won't.

I agree wholeheartedly. The only change is internal to these organizations (eg: CloudFlare, AWS) Improvements will be made to the relevant systems, and some teams internally will also audit for similar behavior, add tests, and fix some bugs.

However, nothing external will change. The cycle of pretending like you are going to implement multi-region fades after a week. And each company goes on continuing to leverage all these services to the Nth degree, waiting for the next outage.

Not advocating that organizations should/could do much, it's all pros/cons. But the collective blast radius is still impressive.

▲

chii 12 hours ago | parent [-]

the root cause is customers refusing to punish these downtime.

Checkout how hard customers punish blackouts from the grid - both via wallet, but also via voting/gov't. It's why they are now more reliable.

So unless the backbone infrastructure gets the same flak, nothing is going to change. After all, any change is expensive, and the cost of that change needs to be worth it.

▲

MikeNotThePope 12 hours ago | parent | next [-]

Is a little downtime such a bad thing? Trying to avoid some bumps and bruises in your business has diminishing returns.

▲

Xelbair 10 hours ago | parent | next [-]

Even more so when most of the internet is also down.

What are customers going to do? Go to competitor that's also down?

It is extremely annoying, will ruin your day, but as movie quote goes - if everyone is special, no one is.

▲

throwaway0352 5 hours ago | parent | next [-]

I think you’re viewing the issue from an office worker’s perspective. For us, downtime might just mean heading to the coffee machine and taking a break.

But if a restaurant loses access to its POS system (which has happened), or you’re unable to purchase a train ticket, the consequences are very real. Outages like these have tangible impacts on everyday life. That’s why there’s definitely room for competitors who can offer reliable backup strategies to keep services running.

	▲	mallets 4 hours ago \| parent \| next [-]
		Those are examples where they shouldn't be using public cloud in the first place. Should build those services to be local-first. Using a different, smaller cloud provider doesn't improve reliability (likely makes it worse) if the architecture itself wrong.
	▲	wongarsu 4 hours ago \| parent \| prev [-]
		Do any of those competitors actually have meaningfully better uptime? From a societal level, having everything shut down at once is an issue. But if you only have one POS system targeting only one backend URL (and that backend has to be online for the POS to work) then cloudflare seems like one of the best choices If the uptime provided by cloudflare isn't enough then the solution isn't a cloudflare competitor, it's the ability to operate offline (which many POS have, including for card purchases) or at least multiple backends with different DNS, CDN, server location etc.

▲

immibis 9 hours ago | parent | prev [-]

They could go to your competitor that's up. If you choose to be up, your competitor's customers could go to you.

▲

dewey 9 hours ago | parent [-]

If it’s that easy to get the exact same service / product as another vendor the maybe your competitive advantage isn’t so high. If Amazon would be down I’d just wait a few hours as I don’t want to sign up on another site.

	▲	MikeNotThePope 7 hours ago \| parent [-]
		I agree. These days it seems like everything is a micro-optimization to squeeze out a little extra revenue. Eventually most companies lose sight of the need to offer a compelling product that people would be willing to wait for.

▲

krige 12 hours ago | parent | prev | next [-]

What's "a little downtime" to you might be work ruined and day wasted for someone else.

	▲	bloppe 9 hours ago \| parent \| next [-]
		I remember a Google cloud outage years ago that happened to coincide with one of our customers' massively expensive TV ads. All the people who normally would've gone straight to their website instead got 502. Probably a 1M+ loss for them all things considered. We got an extremely angry email about it.
	▲	fragmede 10 hours ago \| parent \| prev \| next [-]
		It's 2025. That downtime could be be difference between my cat pics not loading fast enough, or someone's teleoperated robot surgeon glitching out.
	▲	cactusplant7374 3 hours ago \| parent \| prev [-]
		I have a lot of bad days every year. More than I can count. It's just part of living.

▲

aaron_m04 12 hours ago | parent | prev [-]

Depends on the business.

▲

whatevaa 11 hours ago | parent | prev | next [-]

Grid reliability depends on where you live. In some places, UPS or even a generator is a must have. So it's a bad example, I would say.

▲

LoganDark 5 hours ago | parent | prev | next [-]

> Checkout how hard customers punish blackouts from the grid - both via wallet, but also via voting/gov't.

What? Since when has anyone ever been free to just up and stop paying for power from the grid? Are you going to pay $10,000 - $100,000 to have another power company install lines? Do you even have another power company in the area? State? Country? Do you even have permission for that to happen near your building? Any building?

The same is true for internet service, although personally I'd gladly pay $10,000 - $100,000 to have literally anything else at my location, but there are no proper other wired providers and I'll die before I ever install any sort of cellular router. Also this is a rented apartment so I'm fucked even if there were competition, although I plan to buy a house in a year or two.

	▲	heartbreak 4 hours ago \| parent [-]
		The hyperscalers definitely vote with their wallets.

▲

mopsi 12 hours ago | parent | prev [-]

Downtimes happen one way or another. The upside of using Cloudflare is that bringing things back online is their problem and not mine like when I self-host. :]

Their infrastructure went down for a pretty good reason (let the one who has never caused that kind of error cast the first stone) and was brought back within a reasonable time.

▲

ehhthing 12 hours ago | parent | prev | next [-]

With the rise in unfriendly bots on the internet as well as DDoS botnets reaching 15 Tbps, I don’t think many people have much of a choice.

▲

swiftcoder 10 hours ago | parent [-]

The cynic in me wonders how much blame the world's leading vendor of DDoS prevention might share in the creation of that particularly problem

▲

immibis 9 hours ago | parent [-]

They provide free services to DDoS-for-hire services and do not terminate the services when reported.

	▲	zamadatix 5 hours ago \| parent [-]
		Not that I doubt examples exist (I've yet to be at a large place with 0 failures on responding to such issues over the years), but it'd be nice if you'd share the specific examples you have in mind if you're going to bother commenting about it. It helps people understand how much is a systemic problem to be interested in vs having a comment which more easily falls into many other buckets instead. I'd try to build trust off the user profile as well, but it proclaims you're shadowbanned for two different reasons - despite me seeing your comment. One related topic I've seen brought up is Workers abuse https://www.fortra.com/blog/cloudflare-pages-workers-domains..., but that goes against this claim they do nothing when reported.

▲

stingraycharles 10 hours ago | parent | prev | next [-]

It’s just a function of costs vs benefits. For most people, building redundancy at this layer costs far too much than the benefits.

If Cloudflare or AWS go down, the outage is usually so big that smaller players have an excuse and people accept that.

It’s as simple as that.

“Why isn’t your site working?” “Half the internet is down, here read this news article: …” “Oh, okay, let me know when it’s back!”

▲

markus_zhang 6 hours ago | parent | prev | next [-]

It’s too few and far between. It’s gonna make some changes if it’s a monthly event. If businesses start to lose connection for 8 hours every month, maybe the bigger ones are going to run for self hosting or at least some capacity of self hosting.

▲

testdelacc1 11 hours ago | parent | prev | next [-]

If anything, centralisation shields companies using a hyperscaler from criticism. You’ll see downtime no matter where you host. If you self host and go down for a few hours, customers blame you. If you host on AWS and “the internet goes down”, then customers treat it akin to an act of God, like a natural disaster that affects everyone.

It’s not great being down for hours, but that will happen regardless. Most companies prefer the option that helps them avoid the ire of their customers.

Where it’s a bigger problem is when a critical industry like retail banking in a country all choose AWS. When AWS goes down all citizens lose access to their money. They can’t pay for groceries or transport. They’re stranded and starving, life grinds to a halt. But even then, this is not the bank’s problem because they’re not doing worse than their competitors. It’s something for the banking regulator and government to worry about. I’m not saying the bank shouldn’t worry about it, I’m saying in practice they don’t worry about it unless the regulator makes them worry.

I completely empathise with people frustrated with this status quo. It’s not great that we’ve normalised a few large outages a year. But for most companies, this is the rational thing to do. And barring a few critical industries like banking, it’s also rational for governments to not intervene.

▲

BlackFly 6 hours ago | parent | next [-]

I think this really depends on your industry.

If you cannot give a patient life saving dialysis because you don't have a backup generator then you are likely facing some liability. If you cannot give a patient life saving dialysis because your scheduling software is down because of a major outage at a third party and you have no local redundancy then you are in a similar situation. Obviously this depends on your jurisdiction and probably we are in different ones, but I feel confident that you want to live in a district where a hospital is reasonably responsible for such foreseeable disasters.

	▲	testdelacc1 an hour ago \| parent [-]
		Yeah I mentioned banking because of what I was familiar with but medical industry is going to be similar. But they do differ - it’s never ok for a hospital to be unable to dispense care. But it is somewhat ok for one bank to be down. We just assume that people have at least two bank accounts. The problem the banking regulator faces is that when AWS goes down, all banks go down simultaneously. Not terrible for any individual bank, but catastrophic for the country. And now you see what a juicy target an AWS DC is for an adversary. They go down on their own now, but surely Russia or others are looking at this and thinking “damn, one missile at the right data Center and life in this country grinds to a halt”.

▲

graemep 8 hours ago | parent | prev | next [-]

> If anything, centralisation shields companies using a hyperscaler from criticism. You’ll see downtime no matter where you host. If you self host and go down for a few hours, customers blame you.

Not just customers. Your management take the same view. Using hyperscalers is great CYA. The same for any replacement of internally provided services with external ones from big names.

	▲	testdelacc1 7 hours ago \| parent [-]
		Exactly. No one got fired for using AWS. Advocating for self-hosting or a smaller provider means you get blamed when the inevitable downtime comes around.

▲

DeathArrow 11 hours ago | parent | prev [-]

>If anything, centralisation shields companies using a hyperscaler from criticism. You’ll see downtime no matter where you host. If you self host and go down for a few hours, customers blame you.

What if you host on AWS and only you go down? How does hosting on AWS shield you from criticism?

	▲	testdelacc1 11 hours ago \| parent [-]
		This discussion is assuming that the outage is entirely out of your control because the underlying datacenter you relied on went down. Outages because of bad code do happen and the criticism is fully on the company. They can be mitigated by better testing and quick rollbacks, which is good. But outages at the datacenter level - nothing you can do about that. You just wait until the datacenter is fixed. This discussion started because companies are actually fine with this state of affairs. They are risking major outages but so are all their competitors so it’s fine actually. The juice isn’t worth the squeeze to them, unless an external entity like the banking regulator makes them care.

▲

tcfhgj 9 hours ago | parent | prev | next [-]

> As if anybody could viably stop using them.

You can, and even save money.

▲

sjamaan 12 hours ago | parent | prev | next [-]

Same with the big Crowdstrike fail of 2024. Especially when everyone kept repeating the laughable statement that these guys have their shit in order, so it couldn't possibly be a simple fuckup on their end. Guess what, they don't, and it was. And nobody has realized the importance of diversity for resilience, so all the major stuff is still running on Windows and using Crowdstrike.

▲

c0l0 10 hours ago | parent [-]

I wrote https://johannes.truschnigg.info/writing/2024-07-impending_g... in response to the CrowdStrike fallout, and was tempted to repost it for the recent CloudFlare whoopsie. It's just too bad that publishing rants won't change the darned status quo! :')

	▲	graemep 8 hours ago \| parent [-]
		People will not do anything until something really disastrous happens. Even afterwards memories can fade. Cloudstrike has not lost many customers. Covid is a good parallel. A pandemic was always possible, there is always a reasonable chance of one over the course of decades. However people did not take it seriously until it actually happened. A lot of Asian countries are a lot better prepared for a tsunami then they were before 2004. The UK was supposed to have emergency plans for a pandemic, but it was for a flu variant, and I suspect even those plans were under-resourced and not fit for purpose. We are supposed to have plans for a solar storm but when another Carrington even occurs I very much doubt we will deal with it smoothly.

▲

fragmede 9 hours ago | parent | prev [-]

> It obviously won't.

Here's where we separate the men from the boys, the women from the girls, the Enbys from the enbetts, and the SREs from the DevOps. If you went down when Cloudflare went do, do you go multicloud so that can't happen again, or do you shrug your shoulders and say "well, everyone else is down"? Have some pride in your work, do better, be better, and strive for greatness. Have backup plans for your backup plans, and get out of the pit of mediocrity.

Or not, shit's expensive and kubernetes is too complicated and "no one" needs that.

	▲	rkomorn 9 hours ago \| parent [-]
		You make the appropriate cost/benefit decision for your business and ignore apathy on one side and dogma on the other.