Remix.run Logo
gerdesj 7 hours ago

To be fair, turning it off and on again is unreasonably effective.

I recently diagnosed and fixed an issue with Veeam backups that suddenly stopped working part way through the usual window and stopped working from that point on. This particular setup has three sites (prod, my home and DR), and five backup proxies. Anyway, I read logs and Googled somewhat. I rebooted the backup server - no joy, even though it looked like the issue was there. I restarted the proxies and things started working again.

The error was basically: there are no available proxies, even though they were all available (but not working but not giving off "not working" vibes).

I could bother with trying to look for what went wrong but life is too short. This is the first time that pattern has happened to me (I'll note it down mentally and it was logged in our incident log).

So, OK, I'll agree that a reboot should not generally be the first option. Whilst sciencing it or nerding harder is the purist approach, often a cheeky reboot gets the job done. However, do be aware that a Windows box will often decide to install updates if you are not careful 8)

rurban an hour ago | parent | next [-]

Turning it off and on again is risky. I recently upgraded a robot in Australia, had problems with systemd, so I turned it off. And had to wait a few weeks until it could be turned on again, because tailscaled was not setup persistently, the routing was not setup properly (over a phone), the machine had some problems,...

High risk, low reward. But of course the ultimate test if it's properly setup.

But on the other hand, with my tiny hard real-time embedded controllers, a power cycle is the best option. No persistent state, fast power up, reboot in milliseconds. Every little SW error causes a reboot, no problem at all.

rurban an hour ago | parent | prev | next [-]

Turning it off and on again is risky. I recently upgraded a robot in Australia, had problems with systemd, so I turned it off. And had to wait a few weeks until it could be turned on again, because tailscaled was not setup persistently, the routing was not setup properly (over a phone), the machine had some problems,...

High risk, low reward. But of course the ultimate test if it's properly setup.

akerl_ 6 hours ago | parent | prev [-]

No, you didn’t diagnose and fix an issue.

You just temporarily mitigated it.

abrookewood 5 hours ago | parent [-]

Sometimes that is enough - especially for home machines etc.

akerl_ 5 hours ago | parent [-]

I’ve got no problem with somebody choosing to mitigate something instead of fixing it. But it’s just incorrect to apply a blind mitigation and declare that you’ve diagnosed the problem.

butvacuum 2 hours ago | parent [-]

what's the ROI on that?

-- leadership