src/backend/executor/nodeSubplan.c · 7c90c04f07d2ce08c71927889b650eb98f23ee79 · Greenplum / Gpdb

由 Pengzhou Tang 提交于 3月 24, 2020

We hit interconnect hung issue many times in many cases, all have
the same pattern: the downstream interconnect motion senders keep
sending the tuples and they are blind to the fact that upstream
nodes have finished and quitted the execution earlier, the QD
then get enough tuples and wait all QEs to quit which cause a
deadlock.

Many nodes may quit execution earlier, eg, LIMIT, HashJoin, Nest
Loop, to resolve the hung issue, they need to stop the interconnect
stream explicitly by calling ExecSquelchNode(), however, we cannot
do that for rescan cases in which data might lose, eg, commit
2c011ce4. For rescan cases, we tried using QueryFinishPending to
stop the senders in commit 02213a73 and let senders check this
flag and quit, that commit has its own problem, firstly, QueryFini
shPending can only set by QD, it doesn't work for INSERT or UPDATE
cases, secondly, that commit only let the senders detect the flag
and quit the loop in a rude way (without sending the EOS to its
receiver), the receiver may still be stuck inreceiving tuples.

This commit revert the QueryFinishPending method firstly.

To resolve the hung issue, we move TeardownInterconnect to the
ahead of cdbdisp_checkDispatchResult so it guarantees to stop
the interconnect stream before waiting and checking the status
of QEs.

For UDPIFC, TeardownInterconnect() remove the ic entries, any
packets for this interconnect context will be treated as 'past'
packets and be acked with STOP flag.

For TCP, TeardownInterconnect() close all connection with its
children, the children will treat any readable data in the
connection as a STOP message include the closure operation.

A test case is not included, both commit 2c011ce4 and 02213a73
contain one.

7c90c04f

nodeSubplan.c 44.7 KB

Greenplum / Gpdb

Replace nodeSubplan.c