• P
    Fix various issues in Gang management (#8893) · 797065c5
    Paul Guo 提交于
    1. Do not call elog(FATAL) in cleanupQE() since it could be called in
    cdbdisp_destroyDispatcherState() to destroy CdbDispatcherState.  This leads to
    reentrance of cdbdisp_destroyDispatcherState() which is not supported. Changing
    the code to return false instead and to sanity check the reentrance. Returning
    false should be ok since that leads to gang destroying and thus QE resources
    should be destroyed themselves. Here is a typical stack of reentrance.
    
    0x0000000000b8ffeb in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:345
    0x0000000000b90385 in cleanup_dispatcher_handle (h=0x2eff0d8) at cdbdisp.c:488
    0x0000000000b904c0 in cdbdisp_cleanupDispatcherHandle (owner=0x2e80de0) at cdbdisp.c:555
    0x0000000000b27fb7 in CdbResourceOwnerWalker (owner=0x2e80de0, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1375
    0x0000000000b27fd8 in CdbResourceOwnerWalker (owner=0x2f30358, callback=0xb90479 <cdbdisp_cleanupDispatcherHandle>) at resowner.c:1379
    0x0000000000b903d9 in AtAbort_DispatcherState () at cdbdisp.c:511
    0x000000000053b8ab in AbortTransaction () at xact.c:3319
    0x000000000053e057 in AbortOutOfAnyTransaction () at xact.c:5248
    0x00000000005c6869 in RemoveTempRelationsCallback (code=1, arg=0) at namespace.c:4088
    0x000000000093c193 in shmem_exit (code=1) at ipc.c:257
    0x000000000093c088 in proc_exit_prepare (code=1) at ipc.c:214
    0x000000000093bf86 in proc_exit (code=1) at ipc.c:104
    0x0000000000adb6e2 in errfinish (dummy=0) at elog.c:754
    0x0000000000ade465 in elog_finish (elevel=21, fmt=0xe847c0 "cleanup called when a segworker is still busy") at elog.c:1735
    0x0000000000beca81 in cleanupQE (segdbDesc=0x2ee9048) at cdbutil.c:846
    0x0000000000becbc8 in cdbcomponent_recycleIdleQE (segdbDesc=0x2ee9048, forceDestroy=0 '\000') at cdbutil.c:871
    0x0000000000b9815a in RecycleGang (gp=0x2eff7f0, forceDestroy=0 '\000') at cdbgang.c:861
    0x0000000000b9009e in cdbdisp_destroyDispatcherState (ds=0x2eff168) at cdbdisp.c:372
    0x0000000000b96957 in CdbDispatchCopyStart (cdbCopy=0x2f23828, stmt=0x2e364d0, flags=5) at cdbdisp_query.c:1442
    
    2. Force to drop the reader gang for named portal if set command happens
    previously since that setting was not dispatched to that gang and thus we
    should not reuse them.
    
    3. Now that we have the mechanism of destroying DispatcherState in resource
    owner callback when aborting transaction. It is not needed to destroy in some
    dispatcher code.
    
    The added test cases and some existing test cases cover almost all code change
    except the change in cdbdisp_dispatchX() (I can not find a solution to test
    this, and I'll keep it in my mind to see how to test that or similar code).
    
    Reviewed-by: Pengzhou Tang
    Reviewed-by: Asim R P
    797065c5
cdbdisp.c 14.9 KB