-
由 Hans Zeller 提交于
The CExpandNAryJoinDPv2 transform now recognizes the CScalarNAryJoinPredList operator for NAry joins containing outer joins. We now transition gracefully from exhaustive to a MinCard algorithm when we exceed the 10-table join limit in the DPv2 transform. Preprocessor changes: - Because we sometimes call the preprocessor multiple times on a tree (e.g. with CTEs), we needed to make the calls idempotent - Handle the new NAry join correctly in CExpressionPreprocessor::PexprOuterJoinToInnerJoin() - Improve constraint and max card computation for NAry join with left outer joins in it Changes to the CJoinOrderDPv2 class: - Changed base class, CJoinOrder to add information about which vertex a given edge belongs to, if it is an ON predicate - Added a new CJoinOrder constructor to compute this new information - New data structures to describe join levels: We still have an array of levels. Each level is now an SLevelInfo struct. Each SLevelInfo object has an array of SGroupInfo structs. Each SGroupInfo struct has a bitset of atoms (a component) and an expression (SExpressionInfo). The SExpressionInfo now has pointers to child SGroupInfos, in addition to the expression itself, to help computing the cost. - Cost now has the cardinality of the join itself. To avoid computing stats too many times, the stats are stored in each group and we have a new map to look up already computed stats, given a bitset (BitSetToGroupInfoMap). - The cost function also penalizes cross-products, similar to what the NLJ penalty (EcpNLJFactor) does. - We no longer make a list of CExpressionArrays and then pick the cheapest of each ExpressionArray. Instead, we delete expressions that aren't the cheapest in their bitset immediately. To do that, we store all the good expressions in the CJoinOrderDPv2, not in temporary variables. - We limit the number of groups in a given level to the number of groups in a 10-way join. In other words, joins up to 10-way should be enumerated exhaustively, larger joins will be restricted and levels greater than 10 have only one group, like for MinCard. - To implement the limits for groups and also for the top k expression at the highest level, we added a k-heap template struct. Changes to statistics estimation: - Use stats expression for children when deriving stats in higher groups - Use histograms to compute the NDVs for join columns, don't override the scale factor later with the global NDVs (that ignores the effect of some predicates) - We will also need a fix to the join damping factor, see https://github.com/greenplum-db/gporca/pull/561Co-authored-by: NSambitesh Dash <sdash@pivotal.io> Co-authored-by: NHans Zeller <hzeller@pivotal.io> Co-authored-by: NSambitesh Dash <sdash@pivotal.io>
7982bd27