Remix.run Logo
0xbadcafebee 7 hours ago

> With SSL certificates, you usually don’t have the opportunity to build up operational experience working with them, unless something goes wrong. And things don’t go wrong that often with certificates

Don't worry. With 2 or 3 industry players dictating how all TLS certs work, now your certs will expire in weeks rather than years, so you will all be subject to these failures more frequently. But as a back-stop to process failures like this, use automated immutable runbooks in CI/CD. It works like this:

1) Does it need a runbook? Ask yourself, if everything was deleted tomorrow, do you (and all the other people) remember every step needed to get everything running again? If not, it needs a runbook.

2) What's a runbook? It's a document that gives step by step instructions to do a thing. The steps can be text, video recordings, code/shell snippets, etc as long as it does not assume anything and gives all necessary instructions (or links to them) so a random braindead engineer at 3am can just do what it says and it'll result in a working thing.

3) Automate the runbook over time. Put more and more of the steps into some kind of script the user can just run. Put the script into a Docker container so that everyone's laptop environment doesn't have to be identical for the steps to work.

4) Run the containerized script from CI/CD. This ensures all credentials, environment vars, networking, etc are the same when it runs which better ensures success, and that leads to:

5) Running it frequently/on a schedule. Most CI/CD systems support scheduled jobs. Run your runbooks frequently to identify unexpected failures and fix bugs. Most of you get notifications for failed builds, so you'll see failed runbooks. If you use a cron job on a random server, the server could go down, the job could get deleted, or the reports of failure could go to /dev/null; but nobody's missing their CI/CD build failures.

Running runbooks from CI/CD is a game changer. Most devs will never update a document. Some will update code they run on their laptop. But if it runs from CI/CD, now anyone can run it, and anyone can update it, so people actually do keep it up to date.