Fix postmaster reset failure on segment nodes with mirror configured

If a QE crashes for reasons such as SIGSEGV, SIGKILL or PANIC, segment postmaster reset fails sometimes. The root cause is: primary segment postmaster would first tell child processes to exit, then start a filerep peer reset process to instruct mirror postmaster do reset; the filerep peer reset process would only exit when mirror postmaster finishes or fails the reset procedure; primary postmaster would wait for the termination of important processes such as AutoVacuum, BgWriter, CheckPoint, filerep peer reset process etc, before it resets share memory and restarts auxiliary processes; however, in some cases, primary postmaster would be stuck in filerep peer reset step, if mirror postmaster is hanging/waiting for some events; if this happens, filerep peer reset process would wait there until timeout(1 hour), and retry 10 times before reports failure to primary postmaster (so 10 hours in total); so the final result is primary postmaster takes 10 hours to report reset failure. This happens almost every time on mirror segment host machine with poor performance for reasons that: mirror postmaster would do similar reset procedure with primary postmaster, i.e, notify child processes to exit and wait their terminations and then restart auxiliary processes; filerep peer reset process would first connect to mirror postmaster to request a postmaster reset, then it would check the reset status of mirror every 10ms by connecting to mirror postmaster; so it can happen that filerep peer reset process keeps connecting mirror postmaster, which would lead to continuous dead_end backend processes forked, while at the same time mirror postmaster waits for the exit of all dead_end backend processes, so it is possible that the speed of generating new dead_end processes exceeds the exit speed, and hence mirror postmaster can never see the clearance of child processes. All in all, this can lead to hang issue and failure of postmaster reset. This issue exists for master postmaster reset as well on heavy workload circumstances.

Fix postmaster reset failure on segment nodes with mirror configured
If a QE crashes for reasons such as SIGSEGV, SIGKILL or PANIC, segment postmaster reset fails sometimes. The root cause is: primary segment postmaster would first tell child processes to exit, then start a filerep peer reset process to instruct mirror postmaster do reset; the filerep peer reset process would only exit when mirror postmaster finishes or fails the reset procedure; primary postmaster would wait for the termination of important processes such as AutoVacuum, BgWriter, CheckPoint, filerep peer reset process etc, before it resets share memory and restarts auxiliary processes; however, in some cases, primary postmaster would be stuck in filerep peer reset step, if mirror postmaster is hanging/waiting for some events; if this happens, filerep peer reset process would wait there until timeout(1 hour), and retry 10 times before reports failure to primary postmaster (so 10 hours in total); so the final result is primary postmaster takes 10 hours to report reset failure. This happens almost every time on mirror segment host machine with poor performance for reasons that: mirror postmaster would do similar reset procedure with primary postmaster, i.e, notify child processes to exit and wait their terminations and then restart auxiliary processes; filerep peer reset process would first connect to mirror postmaster to request a postmaster reset, then it would check the reset status of mirror every 10ms by connecting to mirror postmaster; so it can happen that filerep peer reset process keeps connecting mirror postmaster, which would lead to continuous dead_end backend processes forked, while at the same time mirror postmaster waits for the exit of all dead_end backend processes, so it is possible that the speed of generating new dead_end processes exceeds the exit speed, and hence mirror postmaster can never see the clearance of child processes. All in all, this can lead to hang issue and failure of postmaster reset. This issue exists for master postmaster reset as well on heavy workload circumstances.
1b02bd8f · Kenan Yao · aa02fa06 · 1b02bd8f
隐藏空白更改
内联并排

Showing with 22 addition and 6 deletion

src/backend/postmaster/postmaster.c src/backend/postmaster/postmaster.c +22 -6

未找到文件。
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -5027,10 +5027,10 @@ static void do_reaper()

 		/*
 		 * Wait for all important children to exit, then reset shmem and
-		 * redo database startup.  (We can ignore the archiver and stats processes
-		 * here since they are not connected to shmem.)
+		 * redo database startup.  (We can ignore the syslogger, archiver and stats
+		 * processes here since they are not connected to shmem.)
 		 */
-		if (DLGetHead(BackendList) ||
+		if (CountChildren(BACKEND_TYPE_ALL) != 0 ||
 		    StartupPID != 0 ||
 		    StartupPass2PID != 0 ||
 		    StartupPass3PID != 0 ||
@@ -5048,6 +5048,25 @@ static void do_reaper()
 			goto reaper_done;
        }

+		/*
+		 * Start waiting for dead_end children to die. This state change causes
+		 * ServerLoop to stop creating new ones. Otherwise, we may infinitely
+		 * wait here on heavy workload circumstances, or in postmaster reset
+		 * cases of segments where FilerepPeerReset process on primary segment
+		 * continuously connects corresponding mirror postmaster.
+		 */
+		if (DLGetHead(BackendList) != NULL)
+		{
+			pmState = PM_CHILD_STOP_WAIT_DEAD_END_CHILDREN;
+			goto reaper_done;
+		}
+
+		/*
+		 * NB: We cannot change the pmState to PM_CHILD_STOP_NO_CHILDREN here,
+		 * since there should be syslogger existing, and maybe archiver and
+		 * pgstats as well.
+		 */
+
        if ( RecoveryError )
        {
    		ereport(LOG,
@@ -5677,9 +5696,6 @@ static PMState StateMachineCheck_WaitBackends(void)
            }
            else
            {
-                /*
-                 * This state change causes ServerLoop to stop creating new ones.
-                 */
                Assert(Shutdown > NoShutdown);
                moveToNextState = true;
            }