Update optimizer output test files and bump orca to v2.75.0
This commit adds the tests corresponding to the below change introduced in PQO. In case of Insert on Randomly distributed relations, a random / redistribute motion must exists to ensure randomness of the data inserted, else there will be skew on 1 segment. Consider the below scenario: Scenario 1: ``` CREATE TABLE t1_random (a int) DISTRIBUTED RANDOMLY; INSERT INTO t1_random VALUES (1), (2); SET optimizer_print_plan=on; EXPLAIN INSERT INTO t1_random VALUES (1), (2); Physical plan: +--CPhysicalDML (Insert, "t1_random"), Source Columns: ["column1" (0)], Action: ("ColRef_0001" (1)) rows:2 width:4 rebinds:1 cost:0.020884 origin: [Grp:1, GrpExpr:2] +--CPhysicalComputeScalar |--CPhysicalMotionRandom ==> Random Motion Inserted (2) | +--CPhysicalConstTableGet Columns: ["column1" (0)] Values: [(1); (2)] ==> Delivers Universal Spec (1) +--CScalarProjectList +--CScalarProjectElement "ColRef_0001" (1) +--CScalarConst (1) ", QUERY PLAN ---------------------------------------------------------------------------------------- Insert (cost=0.00..0.02 rows=1 width=4) -> Result (cost=0.00..0.00 rows=1 width=8) -> Result (cost=0.00..0.00 rows=1 width=4) ==> Random Motion Converted to a Result Node with Hash Filter to avoid duplicates (4) -> Values Scan on "Values" (cost=0.00..0.00 rows=1 width=4) ==> Delivers Universal Spec (3) Optimizer: PQO version 2.70.0 ``` When an Insert is requested on t1_random from a Universal Source, Optimization framework does add a Random Motion / CPhysicalMotionRandom (See (2) above) to redistribute the data. However, since CPhysicalConstTableGet / Values Scan delivers Universal Spec, it is converted to a Result Node with hash filter to avoid duplicates in DXL to Planned Statement (See (4) above). Now, since there is no redistribute motion to spray the data randomly on the segments, due to the result node with hash filters, data from only one segment is allowed to propagate further, which is inserted by the DML node. This results in all the data being inserted on to 1 segment only. Scenario 2: ``` CREATE TABLE t1_random (a int) DISTRIBUTED RANDOMLY; CREATE TABLE t2_random (a int) DISTRIBUTED RANDOMLY; EXPLAIN INSERT INTO t1_random SELECT * FROM t1_random; Physical plan: +--CPhysicalDML (Insert, "t1_random"), Source Columns: ["a" (0)], Action: ("ColRef_0008" (8)) rows:1 width:34 rebinds:1 cost:431.010436 origin: [Grp:1, GrpExpr:2] +--CPhysicalComputeScalar rows:1 width:1 rebinds:1 cost:431.000011 origin: [Grp:5, GrpExpr:1] |--CPhysicalTableScan "t2_random" ("t2_random") rows:1 width:34 rebinds:1 cost:431.000006 origin: [Grp:0, GrpExpr:1] +--CScalarProjectList origin: [Grp:4, GrpExpr:0] +--CScalarProjectElement "ColRef_0008" (8) origin: [Grp:3, GrpExpr:0] +--CScalarConst (1) origin: [Grp:2, GrpExpr:0] ", QUERY PLAN ------------------------------------------------------------------------------------------ Insert (cost=0.00..431.01 rows=1 width=4) -> Result (cost=0.00..431.00 rows=1 width=8) -> Table Scan on t1_random (cost=0.00..431.00 rows=1 width=4) Optimizer: PQO version 2.70.0 ``` When an Insert is request on t1_random (randomly distributed table) from another randomly distributed table, optimization framework does not add a redistribute motion because CPhysicalDML requested Random Spec and the source t2_random delivers random distribution spec matching / satisfying the requested spec. So, in summary, there are 2 causes for skewness in the data inserted into randomly distributed table. Scenario 1: Where the Random Motion is converted into a Result Node to Hash Filters Scneario 2: Where the requested and derived spec matches. This patch fixes the issue by ensuring that if an insert is performed on a randomly distributed table, there must exists a random motion which has been enforced by a true motion. This is achived in 2 steps: 1. CPhysicalDML / Insert on Randomly Distributed table must request a Strict Random Spec 2. Append Enforcer for Random Spec must track if the motion node enforced has a universal child, if it has, then update the bool flag to false else true Characteristic of Strict Random Spec: 1. Strict Random Spec matches / satisfies Strict Random Spec Only 2. Random Spec enforced by a true motion matches / satisfies Strict Random Spec. CPhysicalDML always has a CPhysicalComputeScalar node below it which projects the additional column indicating if it's an Insert. The request for Strict Motion is not propagated down the CPhysicalComputeScalar node, CPhysicalComputeScalar requests Random Spec from its child instead. This is to mandate the existence of a true random motion either by the childs of CPhysicalDML node or it should be inserted between CPhysicalDML and CPhysicalComputeScalar. If there is any motion in the same group holding CPhysicalComputeScalar, it should also request Random Spec from its childs. Plans after the fix Scenario 1: ``` EXPLAIN INSERT INTO t1_random VALUES (1), (2); Physical plan: +--CPhysicalDML (Insert, "t1_random"), Source Columns: ["column1" (0)], Action: ("ColRef_0001" (1)) rows:2 width:4 rebinds:1 cost:0.020884 origin: [Grp:1, GrpExpr:2] +--CPhysicalMotionRandom rows:2 width:1 rebinds:1 cost:0.000051 origin: [Grp:5, GrpExpr:2] ==> Strict Random Motion Enforcing true randomness +--CPhysicalComputeScalar rows:2 width:1 rebinds:1 cost:0.000034 origin: [Grp:5, GrpExpr:1] |--CPhysicalMotionRandom rows:2 width:4 rebinds:1 cost:0.000029 origin: [Grp:0, GrpExpr:2] | +--CPhysicalConstTableGet Columns: ["column1" (0)] Values: [(1); (2)] rows:2 width:4 rebinds:1 cost:0.000008 origin: [Grp:0, GrpExpr:1] +--CScalarProjectList origin: [Grp:4, GrpExpr:0] +--CScalarProjectElement "ColRef_0001" (1) origin: [Grp:3, GrpExpr:0] +--CScalarConst (1) origin: [Grp:2, GrpExpr:0] ", QUERY PLAN ---------------------------------------------------------------------------------------- Insert (cost=0.00..0.02 rows=1 width=4) -> Redistribute Motion 3:3 (slice1; segments: 3) (cost=0.00..0.00 rows=1 width=8) ==> Strict Random Motion Enforcing true randomness -> Result (cost=0.00..0.00 rows=1 width=8) -> Result (cost=0.00..0.00 rows=1 width=4) -> Values Scan on "Values" (cost=0.00..0.00 rows=1 width=4) Optimizer: PQO version 2 ``` Scenario 2: ``` EXPLAIN INSERT INTO t1_random SELECT * FROM t1_random; Physical plan: +--CPhysicalDML (Insert, "t1_random"), Source Columns: ["a" (0)], Action: ("ColRef_0008" (8)) rows:2 width:34 rebinds:1 cost:431.020873 origin: [Grp:1, GrpExpr:2] +--CPhysicalMotionRandom rows:2 width:1 rebinds:1 cost:431.000039 origin: [Grp:5, GrpExpr:2] ==> Strict Random Motion Enforcing True Randomness +--CPhysicalComputeScalar rows:2 width:1 rebinds:1 cost:431.000023 origin: [Grp:5, GrpExpr:1] |--CPhysicalTableScan "t1_random" ("t1_random") rows:2 width:34 rebinds:1 cost:431.000012 origin: [Grp:0, GrpExpr:1] +--CScalarProjectList origin: [Grp:4, GrpExpr:0] +--CScalarProjectElement "ColRef_0008" (8) origin: [Grp:3, GrpExpr:0] +--CScalarConst (1) origin: [Grp:2, GrpExpr:0] ", QUERY PLAN ------------------------------------------------------------------------------------------ Insert (cost=0.00..431.02 rows=1 width=4) -> Redistribute Motion 3:3 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=8) ==> Strict Random Motion Enforcing true randomness -> Result (cost=0.00..431.00 rows=1 width=8) -> Table Scan on t1_random (cost=0.00..431.00 rows=1 width=4) Optimizer: PQO version 2.70.0 (5 rows) ``` Note: If Insert is performed on a Randomly Distributed Table from a Hash Distributed Table (childs), an additional redistribute motion is not enforced between CPhysicalDML and CPhysicalComputeScalar as there already exists a true random motion due to mismatch of random vs hash distributed spec. ``` pivotal=# explain insert into t1_random select * from t1_hash; LOG: statement: explain insert into t1_random select * from t1_hash; LOG: 2018-08-20 13:31:01:135438 PDT,THD000,TRACE," Physical plan: +--CPhysicalDML (Insert, "t1_random"), Source Columns: ["a" (0)], Action: ("ColRef_0008" (8)) rows:100 width:34 rebinds:1 cost:432.043222 origin: [Grp:1, GrpExpr:2] +--CPhysicalComputeScalar rows:100 width:1 rebinds:1 cost:431.001555 origin: [Grp:5, GrpExpr:1] |--CPhysicalMotionRandom rows:100 width:34 rebinds:1 cost:431.001289 origin: [Grp:0, GrpExpr:2] | +--CPhysicalTableScan "t1_hash" ("t1_hash") rows:100 width:34 rebinds:1 cost:431.000623 origin: [Grp:0, GrpExpr:1] +--CScalarProjectList origin: [Grp:4, GrpExpr:0] +--CScalarProjectElement "ColRef_0008" (8) origin: [Grp:3, GrpExpr:0] +--CScalarConst (1) origin: [Grp:2, GrpExpr:0] ", QUERY PLAN ------------------------------------------------------------------------------------------------- Insert (cost=0.00..432.04 rows=34 width=4) -> Result (cost=0.00..431.00 rows=34 width=8) -> Redistribute Motion 3:3 (slice1; segments: 3) (cost=0.00..431.00 rows=34 width=4) -> Table Scan on t1_hash (cost=0.00..431.00 rows=34 width=4) Optimizer: PQO version 2.70.0 (5 rows) Time: 69.309 ms ```
Showing
想要评论请 注册 或 登录