PG: How to restart bgworker

postmaster 的代码路径

->ServerLoop
  ->WaitEventSetWait
  ->process_pm_child_exit
    ->waitpid(-1, )
    ->CleanupBackgroundWorker()

CleanupBackgroundWorker 的细节:

  • 如果bgworker以 0 退出,则正常退出,不参与重启逻辑。

  • 如果以 1 退出,进入后续的正常重启逻辑

  • 否则,视为系统级 crash, 重启整个实例。

关键变量:

  • RegisteredBgWorker.rw_crashed_at 非 0 视为崩溃
  • HaveCrashedWorker postmaster 在本轮循环是是否检测到了 bgworker 崩溃
  • StartWorkerNeeded 此刻是否需要重启 bgworker 。bgworker 支持设置重启间隔,所以 postmaster 在每次循环中,不总会重启所有崩溃的的 bgworker

appendix

waitpid

(Generated from AI, haven’t been confirmed by myself)

while ((pid = waitpid(-1, &exitstatus, WNOHANG)) > 0)

If multiple child processes have already terminated before this loop runs:

  1. The loop will collect them one by one, in the order determined by the operating system’s process table
  2. Each iteration of the loop will retrieve one zombie process
  3. The loop will continue until all terminated child processes have been collected

When a child process terminates in a Unix/Linux system, it doesn’t immediately disappear from the system. Instead, it enters a “zombie” state (sometimes called a “defunct” state). In this state:

  1. The process has finished execution
  2. Most resources have been freed
  3. But an entry in the process table is kept to allow the parent to retrieve the child’s exit status

This zombie state persists until the parent process “reaps” the child by calling wait() or waitpid() to collect its exit status.