• H
    Fix the module of the hash function of a motion (#10087) · f69978d2
    Hao Wu 提交于
    After some refactors for the path/planner, the code assumes that
    the module of the hash function of a motion equals to the Gang
    size of the parent slice. However, it isn't always true, if some
    tables are distributed on part of the segments. It could happen
    during gpexpand.
    Assume the following case, there is a GPDB cluster with 2 segments,
    and the cluster is running `gpexpand` to add 1 segment. Now t1
    and t2 are distributed on the first 2 segments, t3 has finished
    data transfer, i.e. t3 is distributed on three segments.
    See the following plan:
    
    gpadmin=# explain select t1.a, t2.b from t1 join t2 on t1.a = t2.b union all select * from t3;
                                                    QUERY PLAN
    -----------------------------------------------------------------------------------------------------------
     Gather Motion 3:1  (slice1; segments: 3)  (cost=2037.25..1230798.15 rows=7499310 width=8)
       ->  Append  (cost=2037.25..1080811.95 rows=2499770 width=8)
             ->  Hash Join  (cost=2037.25..1005718.85 rows=2471070 width=8)
                   Hash Cond: (t2.b = t1.a)
                   ->  Redistribute Motion 2:3  (slice2; segments: 2)  (cost=0.00..2683.00 rows=43050 width=4)
                         Hash Key: t2.b
                         ->  Seq Scan on t2  (cost=0.00..961.00 rows=43050 width=4)
                   ->  Hash  (cost=961.00..961.00 rows=28700 width=4)
                         ->  Seq Scan on t1  (cost=0.00..961.00 rows=28700 width=4)
             ->  Seq Scan on t3  (cost=0.00..961.00 rows=28700 width=8)
     Optimizer: Postgres query optimizer
    
    The slice2 shows that t2 will redistribute to all 3 segments and join
    with t1. Since t1 is distributed only on the first 2 segments, the data
    from t2 redistributed to the third segment couldn't have a match,
    which returns wrong results.
    
    The root cause is that the module of the cdb hash to redistribute t2 is
    3, i.e. the Gang size of the parent slice. To fix this issue,
    we add a field in Motion to record the number of receivers.
    With this patch, the plan generated is:
    
    gpadmin=# explain select * from t1  join t2 on t1.a = t2.b union all select a,a,b,b from t3;
                                                    QUERY PLAN
    ----------------------------------------------------------------------------------------------------------
     Gather Motion 3:1  (slice2; segments: 3)  (cost=2.26..155.16 rows=20 width=16)
       ->  Append  (cost=2.26..155.16 rows=7 width=16)
             ->  Hash Join  (cost=2.26..151.97 rows=7 width=16)
                   Hash Cond: (t1.a = t2.b)
                   ->  Seq Scan on t1  (cost=0.00..112.06 rows=5003 width=8)
                   ->  Hash  (cost=2.18..2.18 rows=2 width=8)
                         ->  Redistribute Motion 2:3  (slice1; segments: 2)  (cost=0.00..2.18 rows=3 width=8)
                               Hash Key: t2.b
                               Hash Module: 2
                               ->  Seq Scan on t2  (cost=0.00..2.06 rows=3 width=8)
             ->  Seq Scan on t3  (cost=0.00..3.06 rows=2 width=16)
     Optimizer: Postgres query optimizer
    
    Note: the interconnect for the redistribute motion is still 2:3,
    but the data transfer only happens in 2:2.
    
    Co-authored-by: Zhenghua Lyu zlv@pivotal.io
    Reviewed-by: NPengzhou Tang <ptang@pivotal.io>
    Reviewed-by: NJesse Zhang <jzhang@pivotal.io>
    f69978d2
readfast.c 66.8 KB