Enlarge timeout in isolation2:pg_ctl UDF (#9991)
Currently this UDF might report a false positive if the node is still starting up after timeout since currently pg_ctl returns 0 for this case. This behavior is changed in upstream with the below patch: commit f13ea95f Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Wed Jun 28 17:31:24 2017 -0400 Change pg_ctl to detect server-ready by watching status in postmaster.pid. We've seen some test flakiness due to this issue since pg_ctl restart needs more time sometimes on pipeline (by default pg_ctl timeout is 60 seconds). Yesterday I found on a hang job that a primary needs ~ 4 minutes to get the recovery finished during 'pg_ctl restart' (It's test ao_same_trans_truncate_crash which enables fsync. Even it launches a checkpoint before pg_ctl restart, pg_ctl restarts still needs a lot of time). Enlarge the timeout of pg_ctl to 600 seconds now and add a pg_ctl stdout checking before returning OK in the UDF (this check could be removed after PG 12 merge finishes so I added a FIXME there). Here is the output of the pg_ctl experiment: $ pg_ctl -l postmaster.log -D /data/gpdb7/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0 -w -m immediate restart -t 1 waiting for server to shut down.... done server stopped waiting for server to start.... stopped waiting server is still starting up $ echo $? 0 Reviewed-by: NAsim R P <apraveen@pivotal.io>
Showing
想要评论请 注册 或 登录