Remix.run Logo
JoshTriplett 4 hours ago

There's a great piece of software called "molly-guard", which intercepts calls to "poweroff" and "reboot" and similar. It checks if it's being invoked via an SSH session, and if so, it asks you to type the name of the system you're shutting down. That way, you never accidentally shut down a remote server when you meant to shut down your own system (or a different server).

kqr 2 hours ago | parent | next [-]

I once accidentally rebooted the reverse proxy for all our production traffic. We got some very quiet two minutes while it came back up.

After that we installed molly-guard with a check for the number of active connections. Made it painless to reboot standby proxies and difficult to reboot active ones.

(We also instituted pairing on production proxy maintenance. I'm not a fan of pair programming but pair maintenance is great.)

I like telling junior hires about this incident because it teaches them that (a) anyone can make mistakes, (b) even serious mistakes usually aren't that dangerous, (c) you can learn a lot from mistakes with the right mindset, (d) we cannot prevent mistakes but with the right system design we can reduce their consequences.

magicalhippo 3 hours ago | parent | prev [-]

Another fun one is disabling the network interface on a remote server. An acquaintance did that by mistake on a cloud VM running some core services, and the cloud provider had no virtual console for some reason. Ended up having to write off the VM and restore from backup. Fun day at the office.

adrian_b 2 hours ago | parent [-]

Long ago, I succeeded once to cut my own access through SSH to a remote server, after some firewall changes. That of course has required a long trip to the server, for physical access.

However that was good, because after that I have always been extra careful at any changes that could affect the firewall in any way. (That is not restricted to changes in firewall rules, because there are systems where the versions of the firewall program and of the kernel must be correlated, so an inconsistent update may make the firewall revert to its default state of denying all connections.)

kqr 2 hours ago | parent [-]

I can warmly recommend the nohup-sleep-disable-cancel pattern for this, as a dead man's switch for danngerous changes.

https://entropicthoughts.com/locking-yourself-out-with-firew...