So a single configuration mistake in a single place wiped out external reachability of a major economy. It happened in the evening local time and should be fixable, modulo cache TTLs, by morning. This will limit the blast radius somewhat.

Still, at this level, brittle infrastructure is a political risk. The internet's famous "routing around damage" isn't quite working here. Should make for an interesting post mortem.

▲

gerdesj an hour ago | parent | next [-]

"The internet's famous "routing around damage" isn't quite working here."

DNS is a look up service that runs on the internet.

Internet routing of IP packets is what the internet does and that is working fine (for a given value of fine).

You remind me of someone using the term "the internet is down" that really means: "I've forgotten my wifi password".

	▲	LastTrain an hour ago \| parent [-]
		Us non pod-people caught his drift.

▲

belorn 2 hours ago | parent | prev | next [-]

I am reminded of the warning that zonemaster gives about putting your domain name servers on a single AS, as is common practice for many larger providers. A lot of people do not want others to see this as a problem since a single AS is a convenient configuration for routing, but it has the downside of being a single point of failure.

Building redundant infrastructure that can withstand BGP and DNS configuration mistakes are not that simple but it can be done.

	▲	walrus01 32 minutes ago \| parent [-]
		As the CPU/RAM resources to run an authoritative-only slave nameserver for a few domains are extremely minimal (mine run at a unix load of 0.01), it's a very wise idea to put your ns3 or something at a totally different service provider on another continent. It costs less than a cup of coffee per month.

▲

pocksuppet 3 hours ago | parent | prev | next [-]

DNS is a centralization risk, yes. Somehow we've decided this is fine. DNSSEC isn't the only issue - your TLD's nameservers could also be offline, or censored in your country.

▲

skywhopper 3 hours ago | parent | next [-]

DNS is barely centralized. Is there an alternative global name lookup system that is less centralized without even worse downsides?

	▲	pocksuppet 2 hours ago \| parent [-]
		BGP, but the names in question are limited to 128 bits, of which at most 48 will be looked up, and you don't get to choose which 48 bits are assigned to you.

▲

greatgib 2 hours ago | parent | prev | next [-]

Normally it should not have been, with cache and all, but that was the past...

Think about what would happen the day that letsencrypt is borken for whatever reason technical or like having a retarded US leader and being located in the wrong country. Taken into account the push of letsencrypt with major web browsers to restrict certificate validities for short periods like only a few days...

▲

muvlon 2 hours ago | parent [-]

Let's Encrypt has to be down for days before people begin to feel the pain. DNS is very different, it breaks stuff immediately everywhere.

	▲	tharkun__ an hour ago \| parent [-]
		No it doesn't. DNS breaks as soon as TTLs run out. It's your choice to set them so low that stuff breaks immediately.

▲

cyberax 3 hours ago | parent | prev [-]

Not really? .com and .net are still up

If Let's Encrypt goes down, half of the Internet will become inaccessible in a week.

▲

akerl_ 2 hours ago | parent | next [-]

Presumably if LetsEncrypt goes down and stays down for a week, the sites that go down are the ones that see that their CA went down and at no point in the week take the option to get certs from a different CA?

	▲	bluejekyll 23 minutes ago \| parent [-]
		I guarantee that there are a ton of sites out there not monitoring their certs.

▲

sllabres an hour ago | parent | prev | next [-]

So it seems we need something like this [1] for IT infrastructure? ;)

[1] https://outerspaceinstitute.ca/crashclock/

▲

3 hours ago | parent | prev [-]

[deleted]

▲

Muromec 3 hours ago | parent | prev | next [-]

>So a single configuration mistake in a single place wiped out external reachability of a major economy.

And fuck nothing at all happened as a result.

	▲	Our_Benefactors 2 hours ago \| parent [-]
		Prove it? I’m sure many lifespans were lost to stress

▲

lschueller 3 hours ago | parent | prev | next [-]

I have a bad feeling, that the impact will be quite severe for some services, as monitoring, performance, and security services might get disrupted. and just cleaning up is a big mess.. Worst case, some ot will experience outage and / or damage. But maybe I am just overestimating the severity of this.

▲

walrus01 4 hours ago | parent | prev | next [-]

It looks like a failed key replacement during a scheduled maintenance event. Normally this sort of thing is thoroughly tested and has multiple eyes on for detailed review and planning before changes get committed, but obviously something got missed.

▲

the8472 3 hours ago | parent | prev [-]

fail-closed protocols have introduced some brittleness. A HTTP 1.0 server from 1999 probably still can service visitors today. A HTTPS/TLS 1.0 server from the same year wouldn't.