• D
    Wrong preprocessed query for nullable contraints (#527) · 95445145
    Denis Smirnov 提交于
    At first the problem was detected in Greenplum
    
    ```
    create table ta(a int check(a = 1));
    insert into ta values(null);
    
    set optimizer=off;
    select * from ta where a is null;
     a
    ---
    
    (1 row)
    
    set optimizer=on;
    select * from ta where a is null;
     a
    ---
    (0 rows)
    
    set optimizer_print_query = on;
    set client_min_messages='log';
    select * from ta where a is null;
    Algebrized query:
    +--CLogicalSelect
       |--CLogicalGet "ta" ("ta"), Columns: ["a" (0), "ctid" (1), "xmin" (2), "cmin" (3), "xmax" (4), "cmax" (5), "tableoid" (6), "gp_segment_id" (7)] Key sets: {[1,7]}
       +--CScalarNullTest
          +--CScalarIdent "a" (0)
    
    Algebrized preprocessed query:
    +--CLogicalConstTableGet Columns: ["a" (0), "ctid" (1), "xmin" (2), "cmin" (3), "xmax" (4), "cmax" (5), "tableoid" (6), "gp_segment_id" (7)] Values: []
    ```
    
    So ORCA preprocesses a logical plan to a single node
    CLogicalConstTableGet, that means "there is no suitable data in this
    node, no need for retrieving data and constructing a physical
    plan". But in fact it is a wrong behavior for ta table - it can
    contain nulls as far as check a = 1 doesn't protect from nulls. select
    null = 1 returns null, but not a boolean value.
    
    It was easy to guess that the problem was connected with constraint
    null checks and comparing with constants. Debugging showed out that
    CLogicalConstTableGet node is formed in CExpressionPreprocessor.cpp
    file and in a current example max cardinality is zero.
    
    So here is a pipeline that caused the problem:
    CLogicalSelect.cpp -> CConstraintInterval.cpp
    
    As you can see the problem is in m_fIncludesNull parameter that
    describes whether nulls are allowed in a constraint or not. At the
    moment it is always false. But in fact it should be false only if a
    relative column not null - otherwise it should be true.
    
    Under hood m_fIncludesNull is set in a function
    PciIntervalFromColConstCmp that transforms constraints comparing a
    column with a constant to an interval one. So a = 1 constraint always
    transforms to a in [1, 1] and m_fIncludesNull is always set false at
    the moment for a new interval constraint.
    
    The solution is to pass a new parameter infer_nulls_as to several
    methods, to determine whether NULL values in a column qualify the
    row or reject it.
    
    Also this PR contains new tests: check constant constraint on a
    nullable and non-nullable column.
    
    For example:
    
    ```
    create table ta(a int check(a = 1));
    insert into ta values(null);
    
    create table tb(b int not null check(b = 1));
    insert into tb values(1);
    ```
    
    A similar problem occurs with planner (set optimizer = off), see
    https://github.com/greenplum-db/gpdb/issues/8582.
    This commit fixes only the ORCA issue.
    Co-authored-by: NDenis Smirnov <darthunix@gmail.com>
    Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io>
    Co-authored-by: NHans Zeller <hzeller@pivotal.io>
    95445145
CMakeLists.txt 4.8 KB