Support "NDV-preserving" function and op property (#10247)
Orca uses this property for cardinality estimation of joins. For example, a join predicate foo join bar on foo.a = upper(bar.b) will have a cardinality estimate similar to foo join bar on foo.a = bar.b. Other functions, like foo join bar on foo.a = substring(bar.b, 1, 1) won't be treated that way, since they are more likely to have a greater effect on join cardinalities. Since this is specific to ORCA, we use logic in the translator to determine whether a function or operator is NDV-preserving. Right now, we consider a very limited set of operators, we may add more at a later time. Let's assume that we join tables R and S and that f is a function or expression that refers to a single column and does not preserve NDVs. Let's also assume that p is a function or expression that also refers to a single column and that does preserve NDVs: join predicate card. estimate comment ------------------- ------------------------------------- ----------------------------- col1 = col2 |R| * |S| / max(NDV(col1), NDV(col2)) build an equi-join histogram f(col1) = p(col2) |R| * |S| / NDV(col2) use NDV-based estimation f(col1) = col2 |R| * |S| / NDV(col2) use NDV-based estimation p(col1) = col2 |R| * |S| / max(NDV(col1), NDV(col2)) use NDV-based estimation p(col1) = p(col2) |R| * |S| / max(NDV(col1), NDV(col2)) use NDV-based estimation otherwise |R| * |S| * 0.4 this is an unsupported pred Note that adding casts to these expressions is ok, as well as switching left and right side. Here is a list of expressions that we currently treat as NDV-preserving: coalesce(col, const) col || const lower(col) trim(col) upper(col) One more note: We need the NDVs of the inner side of Semi and Anti-joins for cardinality estimation, so only normal columns and NDV-preserving functions are allowed in that case. This is a port of these GPDB 5X and GPOrca PRs: https://github.com/greenplum-db/gporca/pull/585 https://github.com/greenplum-db/gpdb/pull/10090 This is take 2, after reverting the first attempt due to a merge conflict that caused a test to fail.
Showing
此差异已折叠。
此差异已折叠。
想要评论请 注册 或 登录