-
由 Denis Smirnov 提交于
At first the problem was detected in Greenplum ``` create table ta(a int check(a = 1)); insert into ta values(null); set optimizer=off; select * from ta where a is null; a --- (1 row) set optimizer=on; select * from ta where a is null; a --- (0 rows) set optimizer_print_query = on; set client_min_messages='log'; select * from ta where a is null; Algebrized query: +--CLogicalSelect |--CLogicalGet "ta" ("ta"), Columns: ["a" (0), "ctid" (1), "xmin" (2), "cmin" (3), "xmax" (4), "cmax" (5), "tableoid" (6), "gp_segment_id" (7)] Key sets: {[1,7]} +--CScalarNullTest +--CScalarIdent "a" (0) Algebrized preprocessed query: +--CLogicalConstTableGet Columns: ["a" (0), "ctid" (1), "xmin" (2), "cmin" (3), "xmax" (4), "cmax" (5), "tableoid" (6), "gp_segment_id" (7)] Values: [] ``` So ORCA preprocesses a logical plan to a single node CLogicalConstTableGet, that means "there is no suitable data in this node, no need for retrieving data and constructing a physical plan". But in fact it is a wrong behavior for ta table - it can contain nulls as far as check a = 1 doesn't protect from nulls. select null = 1 returns null, but not a boolean value. It was easy to guess that the problem was connected with constraint null checks and comparing with constants. Debugging showed out that CLogicalConstTableGet node is formed in CExpressionPreprocessor.cpp file and in a current example max cardinality is zero. So here is a pipeline that caused the problem: CLogicalSelect.cpp -> CConstraintInterval.cpp As you can see the problem is in m_fIncludesNull parameter that describes whether nulls are allowed in a constraint or not. At the moment it is always false. But in fact it should be false only if a relative column not null - otherwise it should be true. Under hood m_fIncludesNull is set in a function PciIntervalFromColConstCmp that transforms constraints comparing a column with a constant to an interval one. So a = 1 constraint always transforms to a in [1, 1] and m_fIncludesNull is always set false at the moment for a new interval constraint. The solution is to pass a new parameter infer_nulls_as to several methods, to determine whether NULL values in a column qualify the row or reject it. Also this PR contains new tests: check constant constraint on a nullable and non-nullable column. For example: ``` create table ta(a int check(a = 1)); insert into ta values(null); create table tb(b int not null check(b = 1)); insert into tb values(1); ``` A similar problem occurs with planner (set optimizer = off), see https://github.com/greenplum-db/gpdb/issues/8582. This commit fixes only the ORCA issue. Co-authored-by: NDenis Smirnov <darthunix@gmail.com> Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io> Co-authored-by: NHans Zeller <hzeller@pivotal.io>
95445145