提交 35697528 编写于 作者: B Bhuvnesh Chaudhary 提交者: Bhuvnesh

Update optimizer output test files and bump orca to v2.75.0

This commit adds the tests corresponding to the below change introduced in PQO.
In case of Insert on Randomly distributed relations, a random / redistribute
motion must exists to ensure randomness of the data inserted, else there will
be skew on 1 segment.

Consider the below scenario:
Scenario 1:
```
CREATE TABLE t1_random (a int) DISTRIBUTED RANDOMLY;
INSERT INTO t1_random VALUES (1), (2);
SET optimizer_print_plan=on;

EXPLAIN INSERT INTO t1_random VALUES (1), (2);
Physical plan:
+--CPhysicalDML (Insert, "t1_random"), Source Columns: ["column1" (0)], Action: ("ColRef_0001" (1))   rows:2   width:4  rebinds:1   cost:0.020884   origin: [Grp:1, GrpExpr:2]
   +--CPhysicalComputeScalar
      |--CPhysicalMotionRandom  ==> Random Motion Inserted (2)
      |  +--CPhysicalConstTableGet Columns: ["column1" (0)] Values: [(1); (2)] ==> Delivers Universal Spec (1)
      +--CScalarProjectList
         +--CScalarProjectElement "ColRef_0001" (1)
            +--CScalarConst (1)
",
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Insert  (cost=0.00..0.02 rows=1 width=4)
   ->  Result  (cost=0.00..0.00 rows=1 width=8)
         ->  Result  (cost=0.00..0.00 rows=1 width=4) ==> Random Motion Converted to a Result Node with Hash Filter to avoid duplicates (4)
               ->  Values Scan on "Values"  (cost=0.00..0.00 rows=1 width=4) ==> Delivers Universal Spec (3)
 Optimizer: PQO version 2.70.0
```

When an Insert is requested on t1_random from a Universal Source, Optimization
framework does add a Random Motion / CPhysicalMotionRandom (See (2) above) to
redistribute the data. However, since CPhysicalConstTableGet / Values Scan
delivers Universal Spec, it is converted to a Result Node with hash filter to
avoid duplicates in DXL to Planned Statement (See (4) above). Now, since there
is no redistribute motion to spray the data randomly on the segments, due to
the result node with hash filters, data from only one segment is allowed to
propagate further, which is inserted by the DML node. This results in all the
data being inserted on to 1 segment only.

Scenario 2:
```
CREATE TABLE t1_random (a int) DISTRIBUTED RANDOMLY;
CREATE TABLE t2_random (a int) DISTRIBUTED RANDOMLY;
EXPLAIN INSERT INTO t1_random SELECT * FROM t1_random;
Physical plan:
+--CPhysicalDML (Insert, "t1_random"), Source Columns: ["a" (0)], Action: ("ColRef_0008" (8))   rows:1   width:34  rebinds:1   cost:431.010436   origin: [Grp:1, GrpExpr:2]
   +--CPhysicalComputeScalar   rows:1   width:1  rebinds:1   cost:431.000011   origin: [Grp:5, GrpExpr:1]
      |--CPhysicalTableScan "t2_random" ("t2_random")   rows:1   width:34  rebinds:1   cost:431.000006   origin: [Grp:0, GrpExpr:1]
      +--CScalarProjectList   origin: [Grp:4, GrpExpr:0]
         +--CScalarProjectElement "ColRef_0008" (8)   origin: [Grp:3, GrpExpr:0]
            +--CScalarConst (1)   origin: [Grp:2, GrpExpr:0]
",
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Insert  (cost=0.00..431.01 rows=1 width=4)
   ->  Result  (cost=0.00..431.00 rows=1 width=8)
         ->  Table Scan on t1_random  (cost=0.00..431.00 rows=1 width=4)
 Optimizer: PQO version 2.70.0
```

When an Insert is request on t1_random (randomly distributed table) from
another randomly distributed table, optimization framework does not add a
redistribute motion because CPhysicalDML requested Random Spec and the source
t2_random delivers random distribution spec matching / satisfying the requested
spec.

So, in summary, there are 2 causes for skewness in the data inserted into randomly distributed table.
Scenario 1: Where the Random Motion is converted into a Result Node to Hash Filters
Scneario 2: Where the requested and derived spec matches.

This patch fixes the issue by ensuring that if an insert is performed on a
randomly distributed table, there must exists a random motion which has been
enforced by a true motion.  This is achived in 2 steps:
1. CPhysicalDML / Insert on Randomly Distributed table must request a Strict Random Spec
2. Append Enforcer for Random Spec must track if the motion node enforced has
a universal child, if it has, then update the bool flag to false else true

Characteristic of Strict Random Spec:
1. Strict Random Spec matches / satisfies Strict Random Spec Only
2. Random Spec enforced by a true motion matches / satisfies Strict Random Spec.

CPhysicalDML always has a CPhysicalComputeScalar node below it which projects
the additional column indicating if it's an Insert. The request for Strict
Motion is not propagated down the CPhysicalComputeScalar node,
CPhysicalComputeScalar requests Random Spec from its child instead. This is to
mandate the existence of a true random motion either by the childs of CPhysicalDML
node or it should be inserted between CPhysicalDML and CPhysicalComputeScalar.
If there is any motion in the same group holding CPhysicalComputeScalar,
it should also request Random Spec from its childs.

Plans after the fix
Scenario 1:
```
EXPLAIN INSERT INTO t1_random VALUES (1), (2);
Physical plan:
+--CPhysicalDML (Insert, "t1_random"), Source Columns: ["column1" (0)], Action: ("ColRef_0001" (1))   rows:2   width:4  rebinds:1   cost:0.020884   origin: [Grp:1, GrpExpr:2]
   +--CPhysicalMotionRandom   rows:2   width:1  rebinds:1   cost:0.000051   origin: [Grp:5, GrpExpr:2] ==> Strict Random Motion Enforcing true randomness
      +--CPhysicalComputeScalar   rows:2   width:1  rebinds:1   cost:0.000034   origin: [Grp:5, GrpExpr:1]
         |--CPhysicalMotionRandom   rows:2   width:4  rebinds:1   cost:0.000029   origin: [Grp:0, GrpExpr:2]
         |  +--CPhysicalConstTableGet Columns: ["column1" (0)] Values: [(1); (2)]   rows:2   width:4  rebinds:1   cost:0.000008   origin: [Grp:0, GrpExpr:1]
         +--CScalarProjectList   origin: [Grp:4, GrpExpr:0]
            +--CScalarProjectElement "ColRef_0001" (1)   origin: [Grp:3, GrpExpr:0]
               +--CScalarConst (1)   origin: [Grp:2, GrpExpr:0]
",
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Insert  (cost=0.00..0.02 rows=1 width=4)
   ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=0.00..0.00 rows=1 width=8) ==> Strict Random Motion Enforcing true randomness
         ->  Result  (cost=0.00..0.00 rows=1 width=8)
               ->  Result  (cost=0.00..0.00 rows=1 width=4)
                     ->  Values Scan on "Values"  (cost=0.00..0.00 rows=1 width=4)
 Optimizer: PQO version 2

```

Scenario 2:
```
EXPLAIN INSERT INTO t1_random SELECT * FROM t1_random;
Physical plan:
+--CPhysicalDML (Insert, "t1_random"), Source Columns: ["a" (0)], Action: ("ColRef_0008" (8))   rows:2   width:34  rebinds:1   cost:431.020873   origin: [Grp:1, GrpExpr:2]
   +--CPhysicalMotionRandom   rows:2   width:1  rebinds:1   cost:431.000039   origin: [Grp:5, GrpExpr:2] ==> Strict Random Motion Enforcing True Randomness
      +--CPhysicalComputeScalar   rows:2   width:1  rebinds:1   cost:431.000023   origin: [Grp:5, GrpExpr:1]
         |--CPhysicalTableScan "t1_random" ("t1_random")   rows:2   width:34  rebinds:1   cost:431.000012   origin: [Grp:0, GrpExpr:1]
         +--CScalarProjectList   origin: [Grp:4, GrpExpr:0]
            +--CScalarProjectElement "ColRef_0008" (8)   origin: [Grp:3, GrpExpr:0]
               +--CScalarConst (1)   origin: [Grp:2, GrpExpr:0]
",
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Insert  (cost=0.00..431.02 rows=1 width=4)
   ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=8) ==> Strict Random Motion Enforcing true randomness
         ->  Result  (cost=0.00..431.00 rows=1 width=8)
               ->  Table Scan on t1_random  (cost=0.00..431.00 rows=1 width=4)
 Optimizer: PQO version 2.70.0
(5 rows)
```

Note: If Insert is performed on a Randomly Distributed Table from a Hash
Distributed Table (childs), an additional redistribute motion is not enforced
between CPhysicalDML and CPhysicalComputeScalar as there already exists a true
random motion due to mismatch of random vs hash distributed spec.
```
pivotal=# explain insert into t1_random select * from t1_hash;
LOG:  statement: explain insert into t1_random select * from t1_hash;
LOG:  2018-08-20 13:31:01:135438 PDT,THD000,TRACE,"
Physical plan:
+--CPhysicalDML (Insert, "t1_random"), Source Columns: ["a" (0)], Action: ("ColRef_0008" (8))   rows:100   width:34  rebinds:1   cost:432.043222   origin: [Grp:1, GrpExpr:2]
   +--CPhysicalComputeScalar   rows:100   width:1  rebinds:1   cost:431.001555   origin: [Grp:5, GrpExpr:1]
      |--CPhysicalMotionRandom   rows:100   width:34  rebinds:1   cost:431.001289   origin: [Grp:0, GrpExpr:2]
      |  +--CPhysicalTableScan "t1_hash" ("t1_hash")   rows:100   width:34  rebinds:1   cost:431.000623   origin: [Grp:0, GrpExpr:1]
      +--CScalarProjectList   origin: [Grp:4, GrpExpr:0]
         +--CScalarProjectElement "ColRef_0008" (8)   origin: [Grp:3, GrpExpr:0]
            +--CScalarConst (1)   origin: [Grp:2, GrpExpr:0]
",
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Insert  (cost=0.00..432.04 rows=34 width=4)
   ->  Result  (cost=0.00..431.00 rows=34 width=8)
         ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=34 width=4)
               ->  Table Scan on t1_hash  (cost=0.00..431.00 rows=34 width=4)
 Optimizer: PQO version 2.70.0
(5 rows)

Time: 69.309 ms
```
上级 cc797b96
......@@ -40,10 +40,10 @@ AC_RUN_IFELSE([AC_LANG_PROGRAM([[
#include <string.h>
]],
[
return strncmp("2.74.", GPORCA_VERSION_STRING, 5);
return strncmp("2.75.", GPORCA_VERSION_STRING, 5);
])],
[AC_MSG_RESULT([[ok]])],
[AC_MSG_ERROR([Your ORCA version is expected to be 2.74.XXX])]
[AC_MSG_ERROR([Your ORCA version is expected to be 2.75.XXX])]
)
AC_LANG_POP([C++])
])# PGAC_CHECK_ORCA_VERSION
......
......@@ -13526,7 +13526,7 @@ int
main ()
{
return strncmp("2.74.", GPORCA_VERSION_STRING, 5);
return strncmp("2.75.", GPORCA_VERSION_STRING, 5);
;
return 0;
......@@ -13536,7 +13536,7 @@ if ac_fn_cxx_try_run "$LINENO"; then :
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: ok" >&5
$as_echo "ok" >&6; }
else
as_fn_error $? "Your ORCA version is expected to be 2.74.XXX" "$LINENO" 5
as_fn_error $? "Your ORCA version is expected to be 2.75.XXX" "$LINENO" 5
fi
rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \
......
[requires]
orca/v2.74.0@gpdb/stable
orca/v2.75.0@gpdb/stable
[imports]
include, * -> build/include
......
......@@ -121,7 +121,7 @@ sync_tools: opt_write_test /opt/releng/apache-ant
-Divyrepo.user=$(IVYREPO_USER) -Divyrepo.passwd="$(IVYREPO_PASSWD)" -quiet resolve);
ifeq "$(findstring aix,$(BLD_ARCH))" ""
LD_LIBRARY_PATH='' wget --no-check-certificate -q -O - https://github.com/greenplum-db/gporca/releases/download/v2.74.0/bin_orca_centos5_release.tar.gz | tar zxf - -C $(BLD_TOP)/ext/$(BLD_ARCH)
LD_LIBRARY_PATH='' wget --no-check-certificate -q -O - https://github.com/greenplum-db/gporca/releases/download/v2.75.0/bin_orca_centos5_release.tar.gz | tar zxf - -C $(BLD_TOP)/ext/$(BLD_ARCH)
endif
clean_tools: opt_write_test
......
......@@ -1111,6 +1111,7 @@ INFO: Distributed transaction command 'Distributed Prepare' to ALL contents
INFO: Distributed transaction command 'Distributed Commit Prepared' to ALL contents
insert into dd_random select g, g%15 from generate_series(1, 100) g;
INFO: Dispatch command to ALL contents
INFO: Dispatch command to ALL contents
INFO: Distributed transaction command 'Distributed Prepare' to ALL contents
INFO: Distributed transaction command 'Distributed Commit Prepared' to ALL contents
-- non hash distributed tables
......
......@@ -259,6 +259,7 @@ INFO: Distributed transaction command 'Distributed Prepare' to ALL contents
INFO: Distributed transaction command 'Distributed Commit Prepared' to ALL contents
insert into boolean values ('t', 1);
INFO: Dispatch command to ALL contents
INFO: Dispatch command to ALL contents
INFO: Distributed transaction command 'Distributed Prepare' to ALL contents
INFO: Distributed transaction command 'Distributed Commit Prepared' to ALL contents
alter table boolean set distributed by (boo, b);
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册