• H
    DPv2 Transform now handles left outer joins (#564) · 7982bd27
    Hans Zeller 提交于
    The CExpandNAryJoinDPv2 transform now recognizes the CScalarNAryJoinPredList
    operator for NAry joins containing outer joins.
    
    We now transition gracefully from exhaustive to a MinCard algorithm when
    we exceed the 10-table join limit in the DPv2 transform.
    
    Preprocessor changes:
    
    - Because we sometimes call the preprocessor multiple times on a tree
      (e.g. with CTEs), we needed to make the calls idempotent
    - Handle the new NAry join correctly in CExpressionPreprocessor::PexprOuterJoinToInnerJoin()
    - Improve constraint and max card computation for NAry join with left outer joins in it
    
    Changes to the CJoinOrderDPv2 class:
    - Changed base class, CJoinOrder to add information about which vertex a given edge
      belongs to, if it is an ON predicate
    - Added a new CJoinOrder constructor to compute this new information
    
    - New data structures to describe join levels:
    
      We still have an array of levels. Each level is now an SLevelInfo struct.
      Each SLevelInfo object has an array of SGroupInfo structs.
      Each SGroupInfo struct has a bitset of atoms (a component) and
      an expression (SExpressionInfo). The SExpressionInfo now has pointers to
      child SGroupInfos, in addition to the expression itself, to help computing
      the cost.
    
    - Cost now has the cardinality of the join itself. To avoid computing stats
      too many times, the stats are stored in each group and we have a new map to
      look up already computed stats, given a bitset (BitSetToGroupInfoMap).
    
    - The cost function also penalizes cross-products, similar to what the NLJ
      penalty (EcpNLJFactor) does.
    
    - We no longer make a list of CExpressionArrays and then pick the cheapest
      of each ExpressionArray. Instead, we delete expressions that aren't the
      cheapest in their bitset immediately. To do that, we store all the
      good expressions in the CJoinOrderDPv2, not in temporary variables.
    
    - We limit the number of groups in a given level to the number of groups
      in a 10-way join. In other words, joins up to 10-way should be enumerated
      exhaustively, larger joins will be restricted and levels greater than 10
      have only one group, like for MinCard.
    
    - To implement the limits for groups and also for the top k expression at
      the highest level, we added a k-heap template struct.
    
    Changes to statistics estimation:
    
    - Use stats expression for children when deriving stats in higher groups
    
    - Use histograms to compute the NDVs for join columns, don't override the
      scale factor later with the global NDVs (that ignores the effect of some
      predicates)
    
    - We will also need a fix to the join damping factor, see
      https://github.com/greenplum-db/gporca/pull/561Co-authored-by: NSambitesh Dash <sdash@pivotal.io>
    Co-authored-by: NHans Zeller <hzeller@pivotal.io>
    Co-authored-by: NSambitesh Dash <sdash@pivotal.io>
    7982bd27
CMakeLists.txt 4.8 KB