• Z
    Allow tables to be distributed on a subset of segments · 4eb65a53
    ZhangJackey 提交于
    There was an assumption in gpdb that a table's data is always
    distributed on all segments, however this is not always true for example
    when a cluster is expanded from M segments to N (N > M) all the tables
    are still on M segments, to workaround the problem we used to have to
    alter all the hash distributed tables to randomly distributed to get
    correct query results, at the cost of bad performance.
    
    Now we support table data to be distributed on a subset of segments.
    
    A new columne `numsegments` is added to catalog table
    `gp_distribution_policy` to record how many segments a table's data is
    distributed on.  By doing so we could allow DMLs on M tables, joins
    between M and N tables are also supported.
    
    ```sql
    -- t1 and t2 are both distributed on (c1, c2),
    -- one on 1 segments, the other on 2 segments
    select localoid::regclass, attrnums, policytype, numsegments
        from gp_distribution_policy;
     localoid | attrnums | policytype | numsegments
    ----------+----------+------------+-------------
     t1       | {1,2}    | p          |           1
     t2       | {1,2}    | p          |           2
    (2 rows)
    
    -- t1 and t1 have exactly the same distribution policy,
    -- join locally
    explain select * from t1 a join t1 b using (c1, c2);
                       QUERY PLAN
    ------------------------------------------------
     Gather Motion 1:1  (slice1; segments: 1)
       ->  Hash Join
             Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
             ->  Seq Scan on t1 a
             ->  Hash
                   ->  Seq Scan on t1 b
     Optimizer: legacy query optimizer
    
    -- t1 and t2 are both distributed on (c1, c2),
    -- but as they have different numsegments,
    -- one has to be redistributed
    explain select * from t1 a join t2 b using (c1, c2);
                              QUERY PLAN
    ------------------------------------------------------------------
     Gather Motion 1:1  (slice2; segments: 1)
       ->  Hash Join
             Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
             ->  Seq Scan on t1 a
             ->  Hash
                   ->  Redistribute Motion 2:1  (slice1; segments: 2)
                         Hash Key: b.c1, b.c2
                         ->  Seq Scan on t2 b
     Optimizer: legacy query optimizer
    ```
    4eb65a53
cdbmutate.c 101.0 KB