• P
    Fix interconnect hung issue · 7c90c04f
    Pengzhou Tang 提交于
    We hit interconnect hung issue many times in many cases, all have
    the same pattern: the downstream interconnect motion senders keep
    sending the tuples and they are blind to the fact that upstream
    nodes have finished and quitted the execution earlier, the QD
    then get enough tuples and wait all QEs to quit which cause a
    deadlock.
    
    Many nodes may quit execution earlier, eg, LIMIT, HashJoin, Nest
    Loop, to resolve the hung issue, they need to stop the interconnect
    stream explicitly by calling ExecSquelchNode(), however, we cannot
    do that for rescan cases in which data might lose, eg, commit
    2c011ce4. For rescan cases, we tried using QueryFinishPending to
    stop the senders in commit 02213a73 and let senders check this
    flag and quit, that commit has its own problem, firstly, QueryFini
    shPending can only set by QD, it doesn't work for INSERT or UPDATE
    cases, secondly, that commit only let the senders detect the flag
    and quit the loop in a rude way (without sending the EOS to its
    receiver), the receiver may still be stuck inreceiving tuples.
    
    This commit revert the QueryFinishPending method firstly.
    
    To resolve the hung issue, we move TeardownInterconnect to the
    ahead of cdbdisp_checkDispatchResult so it guarantees to stop
    the interconnect stream before waiting and checking the status
    of QEs.
    
    For UDPIFC, TeardownInterconnect() remove the ic entries, any
    packets for this interconnect context will be treated as 'past'
    packets and be acked with STOP flag.
    
    For TCP, TeardownInterconnect() close all connection with its
    children, the children will treat any readable data in the
    connection as a STOP message include the closure operation.
    
    A test case is not included, both commit 2c011ce4 and 02213a73
    contain one.
    7c90c04f
nodeSubplan.c 44.7 KB