Fix confusion with distribution keys of queries with FULL JOINs.
There was some confusion on how NULLs are distributed, when CdbPathLocus is of Hashed or HashedOJ type. The comment in cdbpathlocus.h suggested that NULLs can be on any segment. But the rest of the code assumed that that's true only for HashedOJ, and that for Hashed, all NULLs are stored on a particular segment. There was a comment in cdbgroup.c that said "Or would HashedOJ ok, too?"; the answer to that is "No!". Given the comment in cdbpathlocus.h, I'm not suprised that the author was not very sure about that. Clarify the comments in cdbpathlocus.h and cdbgroup.c on that. There were a few cases where we got that actively wrong. repartitionPlan() function is used to inject a Redistribute Motion into queries used for CREATE TABLE AS and INSERT, if the "current" locus didn't match the target table's policy. It did not check for HashedOJ. Because of that, if the query contained FULL JOINs, NULL values might end up on all segments. Code elsewhere, particularly in cdbgroup.c, assumes that all NULLs in a table are stored on a single segment, identified by the cdbhash value of a NULL datum. Fix that, by adding a check for HashedOJ in repartitionPlan(), and forcing a Redistribute Motion. CREATE TABLE AS had a similar problem, in the code to decide which distribution key to use, if the user didn't specify DISTRIBUTED BY explicitly. The default behaviour is to choose a distribution key that matches the distribution of the query, so that we can avoid adding an extra Redistribute Motion. After fixing repartitionPlan, there was no correctness problem, but if we chose the key based on a HashedOJ locus, there is no performance benefit because we'd need a Redistribute Motion anyway. So modify the code that chooses the CTAS distribution key to ignore HashedOJ. While we're at it, refactor the code to choose the CTAS distribution key, by moving it to a separate function. It had become ridiculously deeply indented. Fixes https://github.com/greenplum-db/gpdb/issues/6154, and adds tests. Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
Showing
想要评论请 注册 或 登录