• H
    Support "NDV-preserving" function and op property (#10247) · a4362cba
    Hans Zeller 提交于
    Orca uses this property for cardinality estimation of joins.
    For example, a join predicate foo join bar on foo.a = upper(bar.b)
    will have a cardinality estimate similar to foo join bar on foo.a = bar.b.
    
    Other functions, like foo join bar on foo.a = substring(bar.b, 1, 1)
    won't be treated that way, since they are more likely to have a greater
    effect on join cardinalities.
    
    Since this is specific to ORCA, we use logic in the translator to determine
    whether a function or operator is NDV-preserving. Right now, we consider
    a very limited set of operators, we may add more at a later time.
    
    Let's assume that we join tables R and S and that f is a function or
    expression that refers to a single column and does not preserve
    NDVs. Let's also assume that p is a function or expression that also
    refers to a single column and that does preserve NDVs:
    
    join predicate       card. estimate                         comment
    -------------------  -------------------------------------  -----------------------------
    col1 = col2          |R| * |S| / max(NDV(col1), NDV(col2))  build an equi-join histogram
    f(col1) = p(col2)    |R| * |S| / NDV(col2)                  use NDV-based estimation
    f(col1) = col2       |R| * |S| / NDV(col2)                  use NDV-based estimation
    p(col1) = col2       |R| * |S| / max(NDV(col1), NDV(col2))  use NDV-based estimation
    p(col1) = p(col2)    |R| * |S| / max(NDV(col1), NDV(col2))  use NDV-based estimation
    otherwise            |R| * |S| * 0.4                        this is an unsupported pred
    Note that adding casts to these expressions is ok, as well as switching left and right side.
    
    Here is a list of expressions that we currently treat as NDV-preserving:
    
    coalesce(col, const)
    col || const
    lower(col)
    trim(col)
    upper(col)
    
    One more note: We need the NDVs of the inner side of Semi and
    Anti-joins for cardinality estimation, so only normal columns and
    NDV-preserving functions are allowed in that case.
    
    This is a port of these GPDB 5X and GPOrca PRs:
    https://github.com/greenplum-db/gporca/pull/585
    https://github.com/greenplum-db/gpdb/pull/10090
    
    This is take 2, after reverting the first attempt due to a merge conflict that
    caused a test to fail.
    a4362cba
CLogicalIntersectAll.cpp 6.2 KB