Remix.run Logo
MORPHOICES 2 days ago

I have been looking into BGP incidents for a while, and one of the things that continues to puzzle me is figuring out the difference between legitimate outages and noisy but expected behavior. ~

The mental model I’ve been using is: Intentional change (maintenance, policy update) Accidental leak (misconfig, partial rollout) Structural failure (dependency or upstream issue) I like to ask three questions first: Did the blast radius grow over time, or did it appear instantly? Did paths change symmetrically or only in one direction? Did things revert cleanly or drift back slowly? Some concrete tricks that helped: Look for AS-path prepending changes first. Compare visibility across regions rather than just globally.

Track “who benefits” from the new paths, even if only for a short time. I’m interested in how others approach this: What is your first indicator that things are indeed wrong? Do you prefer automated alerts or manual recognition of a pattern?