提交 · 7cea6860f524ff3c8f4354d54a3d60778849eba5 · Greenplum / Gpdb

24 11月, 2016 8 次提交

由 Dhanashree Kashid 提交于 11月 18, 2016

Union is different from UnionAll, where the duplicates in results got
removed.

Append is same as UnionAll. In this case, just keep the naming
consistent and avoid confusion.
Signed-off-by: NXin Zhang <xzhang@pivotal.io>

7cea6860

B

Changed order of adding union alternative [#130041367] · fc986e7d
由 Bhuvnesh Chaudhary 提交于 11月 08, 2016

fc986e7d

CTest backs off when load is high [#134476913] · a6a5cf50

由 Jesse Zhang 提交于 11月 23, 2016

This commit makes CTest back off when load average is 4X the core count.
This is similar in spirit to 03af732b, greenplum-db/gpdb@c83e696, and
greenplum-db/gpos@bce4ed7 .

[ci skip]

a6a5cf50

X
Bump GPORCA version 1.691. · 1d801fbf
由 Xin Zhang 提交于 11月 23, 2016
```
Signed-off-by: NHaiSheng Yuan <hyuan@pivotal.io>
```
1d801fbf

Fix extra motion on parallel append [#134466057] · 508aa31f

由 Dhanashree Kashid 提交于 11月 23, 2016

```
create table t (c int) distributed by (c);
xzhang=# explain select * from t, (select * from t union all select * from t) tt where t.c = tt.c;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice4; segments: 3)  (cost=0.00..862.00 rows=1 width=16)
   ->  Hash Join  (cost=0.00..862.00 rows=1 width=16)
         Hash Cond: public.t.c = public.t.c
***         ->  Redistribute Motion 3:3  (slice3; segments: 3)  (cost=0.00..431.00 rows=1 width=8)
               Hash Key: public.t.c
               ->  Append  (cost=0.00..431.00 rows=1 width=8)
                     ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=8)
                           Hash Key: public.t.c
                           ->  Table Scan on t  (cost=0.00..431.00 rows=1 width=8)
                     ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=0.00..431.00 rows=1 width=8)
                           Hash Key: public.t.c
                           ->  Table Scan on t  (cost=0.00..431.00 rows=1 width=8)
         ->  Hash  (cost=431.00..431.00 rows=1 width=8)
               ->  Table Scan on t  (cost=0.00..431.00 rows=1 width=8)
 Settings:  optimizer=on
 Optimizer status: PQO version 1.687
(16 rows)
```
We have a redundant motion (specified by `***`) because parallel append
can only derive random distribution.

In the fix, we make parallel append follow the derive logic of serial
append.
Signed-off-by: NHaiSheng Yuan <hyuan@pivotal.io>

508aa31f

X
Refactoring PdsDerive from CPhysicalSerialUnionAll to CPhysicalUnionAll [#134466057] · 560a1ba4
由 Xin Zhang 提交于 11月 23, 2016
```
Signed-off-by: NHaiSheng Yuan <hyuan@pivotal.io>
```
560a1ba4
X
bump GPORCA version to 1.690 · 4aa8bcf0
由 Xin Zhang 提交于 11月 23, 2016
```
Signed-off-by: NHaiSheng Yuan <hyuan@pivotal.io>
```
4aa8bcf0
X
revert f94b2d4d and 3618c363 · ab9b8cdd
由 Xin Zhang 提交于 11月 23, 2016
```
Signed-off-by: NHaiSheng Yuan <hyuan@pivotal.io>
```
ab9b8cdd

23 11月, 2016 5 次提交
- J
  Only run one installcheck at a time for now [#134476913] · 619a41d9
  由 Jesse Zhang and Xin Zhang 提交于 11月 22, 2016
```
Spooky things happening
[ci skip]
```
  619a41d9
- J
  Bound the load average to 2 processes per core [#134476913] · 03af732b
  由 Jesse Zhang 提交于 11月 22, 2016
```
[ci skip]
```
  03af732b
- J
  Ease the pain of `fly execute` [#134476913] · e09d2af0
  由 Jesse Zhang 提交于 11月 22, 2016
```
[ci skip]
```
  e09d2af0
- J
  Use Github releases for building GPDB [##134476913] · e9687938
  由 Jesse Zhang 提交于 11月 22, 2016
```
[ci skip]
```
  e9687938
- J
  
  Bump GPOS and ORCA to recompile [#134476913] · 787e68ae
  由 Jesse Zhang 提交于 11月 22, 2016
  
  787e68ae
22 11月, 2016 3 次提交

J
Rename poorly named variable [#134476913] · a43edda2
由 Jesse Zhang 提交于 11月 21, 2016
```
[ci skip]
```
a43edda2
X
Bump GPORCA version 1.688 · f94b2d4d
由 Xin Zhang 提交于 11月 21, 2016
```
Signed-off-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
```
f94b2d4d

Fix extra motion on parallel append [#134466057] · 3618c363

由 Dhanashree Kashid 提交于 11月 18, 2016

```
create table t (c int) distributed by (c);
xzhang=# explain select * from t, (select * from t union all select * from t) tt where t.c = tt.c;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice4; segments: 3)  (cost=0.00..862.00 rows=1 width=16)
   ->  Hash Join  (cost=0.00..862.00 rows=1 width=16)
         Hash Cond: public.t.c = public.t.c
***         ->  Redistribute Motion 3:3  (slice3; segments: 3)  (cost=0.00..431.00 rows=1 width=8)
               Hash Key: public.t.c
               ->  Append  (cost=0.00..431.00 rows=1 width=8)
                     ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=8)
                           Hash Key: public.t.c
                           ->  Table Scan on t  (cost=0.00..431.00 rows=1 width=8)
                     ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=0.00..431.00 rows=1 width=8)
                           Hash Key: public.t.c
                           ->  Table Scan on t  (cost=0.00..431.00 rows=1 width=8)
         ->  Hash  (cost=431.00..431.00 rows=1 width=8)
               ->  Table Scan on t  (cost=0.00..431.00 rows=1 width=8)
 Settings:  optimizer=on
 Optimizer status: PQO version 1.687
(16 rows)
```
We have a redundant motion (specified by `***`) because parallel append
can only derive random distribution.

In the fix, we make parallel append follow the derive logic of serial
append.
Signed-off-by: NXin Zhang <xzhang@pivotal.io>

3618c363

18 11月, 2016 1 次提交
- C
  Merge pull request #123 from greenplum-db/consume_new_paths_from_gpdb · 78045b58
  由 Corbin Halliwill 提交于 11月 17, 2016
```
Use new paths from GPDB repo
```
  78045b58
17 11月, 2016 1 次提交

Use new paths from GPDB repo · 9290cfcf

由 Corbin Halliwill 提交于 11月 16, 2016

The gpdb/concourse repo is getting cleaned up. This commit fixes paths
for consumed files that are getting moved.

[#134241435]

9290cfcf

12 11月, 2016 7 次提交

X
Fix memory leak of f9a9bbca · 9afad152
由 Xin Zhang 提交于 11月 11, 2016
```
Signed-off-by: NOmer Arap <oarap@pivotal.io>
```
9afad152
X
Bump GORCA version to 1.687 · db25985f
由 Xin Zhang 提交于 11月 11, 2016
```
Signed-off-by: NOmer Arap <oarap@pivotal.io>
```
db25985f

Fixed outer references in groupby columns [#134091245] · f9a9bbca

由 Omer Arap 提交于 11月 10, 2016

In general, it's beneficial to remove outer references from groupby,
because this value is always a constant for subquery. For example:

```
select a from t where c in (select count(s.j) from s group by s.i, t.b)
```

The `t.b` can be removed safely from above SQL statement, because for
every execution of IN subquery, the `t.b` is constant.

However, this is an issue if the outer reference is the only groupby
column and there is NO additional aggregate functions used. For example:

```
select a from t where c in (select distinct t.b from s)
```

The above statement cannot be further simplified because the rewritten
query below after removing outer reference is invalid:

```
select a from t where c in (select distinct ??? from s)
```

Hence, we add additional validation in pre-processing to ensure correct
rewritten is done.
Signed-off-by: NXin Zhang <xzhang@pivotal.io>

f9a9bbca

J

Bump ORCA version to 1.686 [#133827909] · c06d45f4
由 Jesse Zhang 提交于 11月 11, 2016

c06d45f4

Outer Filter should not pushed down in a partition selector in subquery with... · b35c8774

由 Venkatesh Raghavan 提交于 11月 08, 2016

Outer Filter should not pushed down in a partition selector in subquery with limit clause [#133827909]

The bug is in derivation of required partition propagation specification
from the child. Before this fix, we were pushing the partition
constraints below Limit which will return wrong results.

b35c8774

O
Bump orca version to 1.685 · 32fb46dd
由 Omer Arap 提交于 11月 11, 2016
```
Signed-off-by: NXin Zhang <xzhang@pivotal.io>
```
32fb46dd

Insert random motion for randomly distributed children for parallel append [#133978027] · 34f41433

由 Omer Arap 提交于 11月 09, 2016

Parallel Union currently has 2 optimization request. The second request is not satisfied
when the children of the parallel union are randomly distributed.

For the first request, if the output columns are not hashable columns the
`CPhysicalParallelUnion` requires random distribution. Since the children are randomly
distributed the `CPhysicalTableScan` already satisfied the optimization request and no
motion is created on top of `CPhysicalTableScan`.

To overcome this issue and enforce a `CPhysicalMotionRandom`, we introduce a new distribution
spec called `CDistributionSpecRandomStrict`. This distribution spec is requested replaces
regular `CDistributionSpecRandom` in the first optimization request where no output column is
redistributable.

Below is the output after this commit:
```
explain select xmin from foo union all select xmin from bar;

Physical plan:
+--CPhysicalMotionGather(master)   rows:1   width:4  rebinds:1   cost:431.000029   origin: [Grp:2, GrpExpr:3]
   +--CPhysicalParallelUnionAll   rows:1   width:4  rebinds:1   cost:431.000014   origin: [Grp:2, GrpExpr:1]
      |--CPhysicalMotionRandom   rows:1   width:38  rebinds:1   cost:431.000014   origin: [Grp:0, GrpExpr:2]
      |  +--CPhysicalTableScan "foo" ("foo")   rows:1   width:38  rebinds:1   cost:431.000007   origin: [Grp:0, GrpExpr:1]
      +--CPhysicalMotionRandom   rows:1   width:38  rebinds:1   cost:431.000014   origin: [Grp:1, GrpExpr:2]
         +--CPhysicalTableScan "bar" ("bar")   rows:1   width:38  rebinds:1   cost:431.000007   origin: [Grp:1, GrpExpr:1]

                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice3; segments: 3)  (cost=0.00..431.00 rows=1 width=4)
   ->  Append  (cost=0.00..431.00 rows=1 width=4)
         ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=4)
               ->  Table Scan on foo  (cost=0.00..431.00 rows=1 width=4)
         ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=0.00..431.00 rows=1 width=4)
               ->  Table Scan on bar  (cost=0.00..431.00 rows=1 width=4)

```
Signed-off-by: NXin Zhang <xzhang@pivotal.io>

34f41433

02 11月, 2016 1 次提交

Remove whole banning mechanism from gporca [#133570595] · 10a4e6bc

由 Haisheng Yuan 提交于 11月 02, 2016

gporca has a set of banned API calls which needs to be allowed with the
ALLOW_xxx macro in order for gpopt to compile. But it should be the
library caller(GPDB/Orca)'s resposibility to take care of the function call.

see discussions on greenplum-db/gpdb#1136
and https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/Mcw6JPav6h4

10a4e6bc

01 11月, 2016 6 次提交

Increase uniformity of metric prettyprinting · 9b61ad28

由 Daniel Gustafsson 提交于 11月 01, 2016

The trailing colon which preceds the timing info is added elsewhere
so remove from the OPT string; also remove space between time and
unit on Xforms to make it uniform with other metric printings in
order to make parsing the output via scripts easier.

9b61ad28

Fix UNION query failed to generate plan when parallel union is enabled [#132718373] · 9d58cb3d

由 Haisheng Yuan and Omer Arap 提交于 11月 01, 2016

Orca failed to generate plans for the following query when
parallel union is enabled:

SELECT * FROM foo UNION SELECT * FROM bar;

Because returning EpetRequired in the EpetDistribution of CPhysicalParallelUnionAll
causes the enforcement framework to falsely introduce motions that can
cause failure in orca to generate plans for some queries.

When we introduce unnecessary CPhysicalMotionHashDistribute on top of
CPhysicalParallelUnionAll, the system actually detects that it is unnecessary.
However, the optimization flow gets corrupted because of that since
COptimizationContexts are indeed related to the each other. One optimization
context can be a child or a parent of another optimization context in another group.

To resolve the issue, share the same logic with the CPhysicalSerialUnionAll
by lifting the CPhysicalSerialUnionAll::EpetRequired code to the parent
class CPhysicalUnionAll.

Closes #116

9d58cb3d

GPORCA computes wrong statistics when filter contains <> or NOT IN a IS · 204190f9

由 Venkatesh Raghavan 提交于 10月 31, 2016

NULL condition [#132928535]

Histograms have the following components:
* Histogram buckets
* Null frequency
* Distinct values that are not captured by the buckets in both the
* histograms

Consider the following scenario:

```
create table test_stat(a bigint, b integer, c varchar(1) ) distributed
by (a);

insert into test_stat select id, mod(id,100),'J' from
generate_series(1,1000000) id;

insert into test_stat select id, mod(id,100), null from
generate_series(1,1000000) id;

select * from test_stat where c <> 'J' or c is null;
```

In the above query has a predicates combined by an OR. The predicates
are on the same column `c`. GPORCA while computing the histogram of the
OR, we do not compute the contribution of the null values and the
distinct remain values.

In this patch we fix this issue.

204190f9

K

Bump ORCA version to 1.681 · 5b5a7daa
由 Karthikeyan Jambu Rajaraman 提交于 10月 28, 2016

5b5a7daa
K

Change ExmiWarningAsError to ExmiOptimizerError.(Closes #115) · 1ac2fcd5
由 Karthikeyan Jambu Rajaraman 提交于 10月 28, 2016

1ac2fcd5

Revert "GPORCA computes wrong statistics when filter contains "<> or NOT IN +... · 5cd1071f

由 Xin Zhang 提交于 10月 31, 2016

Revert "GPORCA computes wrong statistics when filter contains "<> or NOT IN + a IS NULL" condition [#132928535]"

This reverts commit 1dafa7fc.
Signed-off-by: NOmer Arap <oarap@pivotal.io>

5cd1071f

29 10月, 2016 7 次提交

GPORCA computes wrong statistics when filter contains "<> or NOT IN + a IS... · 1dafa7fc

由 Venkatesh Raghavan 提交于 10月 28, 2016

GPORCA computes wrong statistics when filter contains "<> or NOT IN + a IS NULL" condition [#132928535]

Histograms have the following components:
* Histogram buckets
* Null frequency
* Distinct values that are not captured by the buckets in both the histograms

Consider the following scenario:

```
create table test_stat(a bigint, b integer, c varchar(1) ) distributed by (a);

insert into test_stat select id, mod(id,100),'J' from generate_series(1,1000000) id;

insert into test_stat select id, mod(id,100), null from generate_series(1,1000000) id;

select * from test_stat where c <> 'J' or c is null;
```

In the above query has a predicates combined by an OR. The predicates
are on the same column `c`. GPORCA while computing the histogram of the
OR, we do not compute the contribution of the null values and the
distinct remain values.

In this patch we fix this issue.

1dafa7fc

O

Increase ICG timeout from 90m to 120m for concourse pipeline. · 1e2f18eb
由 Omer Arap 提交于 10月 28, 2016

1e2f18eb
O
Bump ORCA version to 1.680 [#130121327] · 849ce44f
由 Omer Arap 提交于 10月 28, 2016
```
Signed-off-by: NXin Zhang <xzhang@pivotal.io>
```
849ce44f
J

Test for when parallel union cannot be generated [#130121327] · 3f8ea4f2
由 Jesse Zhang and Omer Arap 提交于 10月 24, 2016

3f8ea4f2
J

Update on test files with enabling serial union [#130121327] · d50ffaee
由 Jesse Zhang and Omer Arap 提交于 10月 24, 2016

d50ffaee

Enable both parallel and serial union [#130121327] · 23d4972d

由 Jesse Zhang and Omer Arap 提交于 10月 24, 2016

Previously, when `optimizer_parallel_union` GUC is set, we were only
generating plans with `parallel union all` and omitting `serial union
all`. In this commit, we generate plans for both `serial union all` and
`parallel union all`.

In some cases, enforcement framework may not let `parallel union all` to
be part of a plan because of newly introduced distribution specs and
motions it generates. It is better to provide the legacy union all
implementation to be part of the alternatives instead of completely
omitting the alternative.
Signed-off-by: NXin Zhang <xzhang@pivotal.io>

23d4972d

Cost model favors parallel union all [#130121327] · 8056098e

由 Jesse Zhang and Omer Arap 提交于 10月 24, 2016

We enable both `serial` and `parallel union all` in the search space
and we favor `parallel union all` if the `optimizer_parallel_union`
GUC is enabled. `parallel union all` is costed as the same as maximum
costed child instead of adding the children's cost all together.

This makes the framework to choose the parallel union all instead of
serial union all when both operators are part of valid plan
alternatives.

8056098e

28 10月, 2016 1 次提交
- K
  
  Bump ORCA version to 1.679 · ba07d795
  由 Karthikeyan Jambu Rajaraman 提交于 10月 27, 2016
  
  ba07d795