Remix.run Logo
jamiesonbecker 3 hours ago

SSH certs quietly hurt in prod. Short-lived creds + centralized CA just moves complexity upward without solving the core problem: user management.

The system shifts from many small local states to one highly coupled control point. That control point has to be correct and reachable all the time. When it isn’t, failures go wide instead of narrow.

Example: a few boxes get popped and start hammering the CA. Now what? Access is broken everywhere at once.

Common friction points:

     1. your signer that has to be up and correct all the time
     2. trust roots everywhere (and drifting)
     3. TTL tuning nonsense (too short = random lockouts, too long = what was the point)
     4. limited on-box state makes debugging harder than it should be
     5. failures tend to fan out instead of staying contained
Revocation is also kind of a lie. Just waiting for expiry and hoping that’s good enough.

What actually happens is people reintroduce state anyway: sidecars, caches, agents… because you need it.

We went the opposite direction:

     1. nodes pull over outbound HTTPS
     2. local authorized_keys is the source of truth locally
     3. users/roles are visible on the box
     4. drift fixes itself quickly
     5. no inbound ports, no CA signatures (WELL, not strictly true*!)
You still get central control, but operation and failure modes are local instead of "everyone is locked out right now."

That’s basically what we do at Userify (https://userify.com). Less elegant than certs, more survivable at 2am. Also actually handles authz, not just part of authn.

And the part that usually gets hand-waved with SSH CAs:

     1. creating the user account
     2. managing sudo roles
     3. deciding what happens to home directories on removal
     4. cleanup vs retention for compliance/forensics
Those don’t go away - they're just not part of the certificate solution.

* (TLS still exists here, just at the transport layer using the system trust store. That channel delivers users, keys, and roles. The rest is handled explicitly instead of implied.)

ngrilly 2 hours ago | parent [-]

How do you solve TOFU?