Remix.run Logo
themafia 3 days ago

That's what started the incident.

It was prolonged by the fact that Cloudflare didn't react correctly to withdrawn BGP routes to a major peer, that the secondary routes had reduced capacity due to unaddressed problems, and basic nuisance rate limiting had to be done manually.

It seems like they just build huge peering pipes and basically just hope for the best. They've maybe gotten so used to this working that they'll let degraded "secondary" links persist for much longer than they should. It's the typical "Swiss Cheese" style of failure.

vlovich123 3 days ago | parent [-]

Wasn’t the problem exacerbated precisely by withdrawing a BGP link because all the same traffic is then forced over a smaller number of physical links?