| ▲ | zelon88 2 days ago | ||||||||||||||||
This, 100%. I'd like to add my reasoning for a similar failure of an HP Proliant server I encountered. Sometimes hardware can fail during long uptime and not become a problem until the next reboot. Consider a piece of hardware with 100 features. During typical use, the hardware may only use 50 of those features. Imagine one of the unused features has failed. This would not cause a catastrophic failure during typical use, but on startup (which rarely occurs) that feature is necessary and the system will not boot without it. If it could, it could still perform it's task... because the damaged feature is not needed. But it can't get past the boot phase, where the feature is required. Tl;dr the system actually failed months ago and the user didn't notice because the missing feature was not needed again until the next reboot. | |||||||||||||||||
| ▲ | startupsfail 2 days ago | parent [-] | ||||||||||||||||
Is there a good reason why upgrades need to stress-test the whole system? Can't they go slowly, throttling resource usage to background levels? They involve heavy CPU use, stress the whole system completely unnecessary, the system easily sees the highest temperature the device had ever seen during these stress tests. If during that strain something fails or gets corrupted, it's a system-level corruption... Incidentally, Linux kernel upgrades are not better. During DKMS updates the CPU load skyrockets and then a reboot is always sketchy. There's no guarantee that something would not go wrong, a secure boot issue after a kernel upgrade in particular could be a nightmare. | |||||||||||||||||
| |||||||||||||||||