Remix.run Logo
lokar 4 hours ago

This applies to all infra.

Why can you delete a network load balancer that is still getting traffic?

Why can you delete a VM that is getting non-trivial network traffic?

Why can you delete a database that has sessions / requests in the last hour?

Why can you drop a table that has queries in the last hour?

traderj0e 2 hours ago | parent [-]

Someone will add safeguards for all that stuff and it ends up making it way harder to get real work done. I know in theory all of it can be done well, but in practice it's harder than it might sound.

I've seen this at work the most with slow rollouts. They said it was for prod only, then it became applied to staging and dev somehow. They said you can force push in emergencies, but approximately 0 people on any given team know how to do this reliably, and it still takes way longer even in --force --now --breakglass --yesimeanit mode. So the end result is longer MTTR. It maybe prevents some kinds of outages, but also you're less likely to manually monitor a rollout when it takes longer.

lokar an hour ago | parent [-]

If you automate it all it’s fine. The automation has no problem waiting around for traffic to drain out of something before decommissioning it.

traderj0e an hour ago | parent [-]

Then you're back to square 1. The only way to win here is to require user supervision and make it simple and easy to use.