Remix.run Logo
Joker_vD 17 hours ago

The tl;dr is that Postgres, as any long-running "server" process (especially as a DBMS server!) does not run with SIG_DFL as the handler for SIGTERM; it instead sets up the signal handler that merely records the fact that the signal has happened, in hopes that whatever loops are going on will eventually pick it up. As usual, some loops don't but it's very hard to notice.

namibj 13 hours ago | parent | next [-]

Feels like they should come with watchdog timers that alert when a loop fails to check in on this status often enough? Then the only thing that still needs manual checking to cover these more-exotic states is that the code path for loop watchdog-and-signal-checkin will indeed lead to early exit from the loop it's placed in (and all surrounding loops).

Sure, it's more like fuzzing then, but running in production on systems that are willing to send in big reports for logged watchdog alerts would give nearly free monitoring of very diverse and realistic workloads to hopefully cover at least relevant-in-practice codepaths with many samples before a signal ever happens to those more-rare but still relevant code paths...

bbarnett 15 hours ago | parent | prev [-]

Indeed. I've seen DBMSes take close to 10 minutes to gracefully exit, even when idle.

Timeout in sysvinit and service files, for a graceful exit, is typically 900 seconds.

Most modern DBMS daemons will recover if SIGKILL, most of the time, especially if you're using transactions. But startup will then be lagged, as it churns through and resolves on startup.

(unless you've modified the code to short cut things, hello booking.com "we're smarter than you" in 2015 at least, if not today)

sjsdaiuasgdia 12 hours ago | parent [-]

Yeah, as I've said on many incident calls...we can pay for transaction recovery at shutdown or we can pay for it at startup, but it's got to happen somewhere.

The "SIGKILL or keep waiting?" decision comes down to whether you believe the DBMS is actually making progress towards shutdown. Proving that the shutdown is progressing can be difficult, depending on the situation.