提交 · e72d01d0f8a2464fe03e682337f85189fe99314e · Greenplum / Gpdb

04 5月, 2017 6 次提交

Address pylint warnings and errors (gpconfig | gpexpand) (#2348) · e72d01d0

由 Marbin Tan 提交于 5月 03, 2017

* Address pylint warnings and errors

- Fix whitespace and indentation
- Remove unused variables
- Fix syntax errors
Signed-off-by: NMarbin Tan <mtan@pivotal.io>

e72d01d0

Fix bug that partition selector may generate incomplete results for NLJ · befc063b

由 Haisheng Yuan 提交于 4月 26, 2017

By setting material->cdb_strict to true if the outer child of material is a
partition selector.  Before this patch, the following query return incomplete
results quite often:

create table t(id int, a int);
create table pt(id int, b int) DISTRIBUTED BY (id)
PARTITION BY RANGE(b) (START (0) END (5) EVERY (1));
insert into t select i, i from generate_series(0,4) i;
insert into pt select i, i from generate_series(0,4) i;
analyze t;
analyze pt;
set enable_hashjoin=off;
set enable_nestloop=on;
select * from t, pt where a = b;

In 3 segments cluster, it may return different result shown below:
hyuan=# select * from t, pt where a = b;
 id | a | id | b
----+---+----+---
  0 | 0 |  0 | 0
  1 | 1 |  1 | 1
  2 | 2 |  2 | 2
(3 rows)

hyuan=# select * from t, pt where a = b;
 id | a | id | b
----+---+----+---
  3 | 3 |  3 | 3
  4 | 4 |  4 | 4
(2 rows)

hyuan=# select * from t, pt where a = b;
 id | a | id | b
----+---+----+---
  3 | 3 |  3 | 3
  4 | 4 |  4 | 4
  0 | 0 |  0 | 0
  1 | 1 |  1 | 1
  2 | 2 |  2 | 2
(5 rows)

But only the last one is correct result.

The plan for above query is:
-------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)
   ->  Nested Loop  (cost=2.27..9.00 rows=2 width=16)
         Join Filter: t.a = public.pt.b
         ->  Append  (cost=0.00..5.05 rows=2 width=8)
               ->  Result  (cost=0.00..1.01 rows=1 width=8)
                     One-Time Filter: PartSelected
                     ->  Seq Scan on pt_1_prt_1 pt
               ->  Result  (cost=0.00..1.01 rows=1 width=8)
                     One-Time Filter: PartSelected
                     ->  Seq Scan on pt_1_prt_2 pt
               ->  Result  (cost=0.00..1.01 rows=1 width=8)
                     One-Time Filter: PartSelected
                     ->  Seq Scan on pt_1_prt_3 pt
               ->  Result  (cost=0.00..1.01 rows=1 width=8)
                     One-Time Filter: PartSelected
                     ->  Seq Scan on pt_1_prt_4 pt
               ->  Result  (cost=0.00..1.01 rows=1 width=8)
                     One-Time Filter: PartSelected
                     ->  Seq Scan on pt_1_prt_5 pt
         ->  Materialize  (cost=2.27..2.42 rows=5 width=8)
               ->  Partition Selector for pt (dynamic scan id: 1)
                     Filter: t.a
                     ->  Broadcast Motion 3:3  (slice1; segments: 3)
                           ->  Seq Scan on t
 Settings:  enable_hashjoin=off; enable_nestloop=on
 Optimizer status: legacy query optimizer

The data distribution for table t and pt in 3 segments environment is:
hyuan=# select gp_segment_id, * from t;
 gp_segment_id | id | a
---------------+----+---
             1 |  3 | 3
             1 |  4 | 4
             0 |  0 | 0
             0 |  1 | 1
             0 |  2 | 2
(5 rows)

hyuan=# select gp_segment_id, * from pt;
 gp_segment_id | id | b
---------------+----+---
             0 |  0 | 0
             0 |  1 | 1
             0 |  2 | 2
             1 |  3 | 3
             1 |  4 | 4
(5 rows)

Tuples {0,1,2} of t and pt are in segment 0, tuples {3,4} of t and pt are in
segment 1. Segment 2 has no data for t and pt.

In this query, planner decides to prefetch inner child to avoid deadlock hazard
and the cdb_strict of Material is set to false. Let's see how the query goes in
segment 0.

1. The inner child of nestloop join, material fetch one tuple from partition
selector and then output it. Let's assume the output order of partition
selector/broadcast motion is {0,1,2,3,4}. So the 1st tuple output by partition
selector and material is 0.

2. The partition selector decides that the selected partition for table pt is
pt_1_prt_1, because t.a = pt.b = 0 in this partition. The outer child of
nestloop join, Append, fetches 1 tuple from that partition, with pt.b=0.

3. Nestloop join continues to execute Material of inner child to fetch other
tuples, 1,2,3,4, but all these tuples from t don't match the join condition,
because pt=0. No more tuples output by nestloop join for this round of loop.
But all the partition of pt are matched and selected.

4. Nestloop join fetch another tuple from pt_1_prt_2, which is 1, which can
match with a tuple from inner child, output 1. And then fetch tuple from
pt_1_prt_3, which is 2, matched, output 2. But pt_1_prt_4 and pt_1_prt_5 have
no data in this segment, so the output ends with {0,1,2} in segment 0.

But in segment 1, let's still assume the tuple output order of partition
selector/broadcast motion is {0,1,2,3,4}. Since the first output tuple from
inner child is 0, only pt_1_prt_1 is selected. But when nestloop join tries to
fetch tuple from outer child, which in fact fetch from pt_1_prt_1 in this case,
it returns no tuple, because pt_1_prt_1 is empty in this segment. So nestloop
join decides that since it can't fetch any tuple from outer child, it must be
empty, no need to execute the join, return NULL and finish it directly.

Segment 2 has no data for t and pt, no tuple is output. So the final result
gathered on master segment is {0,1,2} in this case. But if the broadcast motion
output tuple order is {3,4,0,1,2}, the final result may be {3,4}. If the
braodcast motion output tuple order on segment 0 is {0,1,2,3,4}, and on segment
1 is {3,4,0,1,2}, then the final result on master is {0,1,2,3,4}, which is
correct.

The bug is fixed by setting cdb_strict of material to true when planner
generates partition selector for the inner child of nestloop join, the material
will fetch all the tuples from child and materialize them before emitting any
tuple.  Thus we can make sure the partitions of pt are selected correctly.

RCA by Lirong Jian <jianlirong@gmail.com>

befc063b

C
gpperfmon: link to wiki · 64b30837
由 C.J. Jameson 提交于 5月 03, 2017
```
[ci skip]
Signed-off-by: NMarbin Tan <mtan@pivotal.io>
```
64b30837
M
gpperfmon: remove unused variable to quiet warnings · e2477236
由 Marbin Tan 提交于 5月 02, 2017
```
Signed-off-by: NLarry Hamel <lhamel@pivotal.io>
```
e2477236
L
gpperfmon: remove unused variable · 360d42c2
由 Larry Hamel 提交于 5月 02, 2017
```
Signed-off-by: NMarbin Tan <mtan@pivotal.io>
```
360d42c2
J
Removed EMCConnect from ext and update the change to third-party/ext 3.1 · 453aba22
由 Jingyi Mei 提交于 5月 02, 2017
```
Signed-off-by: NTushar Dadlani <tdadlani@pivotal.io>
```
453aba22

03 5月, 2017 13 次提交

ALTER RESOURCE GROUP SET CONCURRENCY N · 043511e9

由 xiong-gang 提交于 5月 03, 2017

Increase the 'concurrency' limit can take effect immediately, and the
queueing transactions can be woken up. Decrease the 'concurrency' is
different, if the new limit is smaller than the number of current running
transactions, ALTER statement won't cancel the running transactions to
the limit. Therefore, we use column 'proposed' in pg_resgroupcapability
to represent the effective limit, and use column 'value' to record the
historical limit.
For example, we have a resource group with concurrency=3, and there
are 3 running transactions and 3 queueing transactions. If we alter the
concurrency to 2, 'proposed' will be updated to 2 and 'value' will stay
as 3. When one running transaction is finished, it won't wake up the
transactions in the queue as the current concurrency is 2. If we execute
the statement again to alter the concurrency to 2, it will update the
'value' column to 2, and the 'value' is consistent with 'proposed'
again.
Signed-off-by: NRichard Guo <riguo@pivotal.io>

043511e9

Support COPY ON SEGMENT command · 49b12f18

由 Adam Lee 提交于 5月 03, 2017

Support COPY statement that exports the table directly from segment
to local file parallelly.

This commit adds a keyword "on segment" to save the copied file on
"segment" instead of on "master".

Two place holders are used, which are "<SEG_DATA_DIR>" and "<SEGID>"
and will be replced to segment datadir and segment id.

E.g.

```
COPY tbl TO '/tmp/<SEG_DATA_DIR>filename<SEGID>.txt' ON SEGMENT;
```
Signed-off-by: NYuan Zhao <yuzhao@pivotal.io>
Signed-off-by: NHaozhou Wang <hawang@pivotal.io>
Signed-off-by: NAdam Lee <ali@pivotal.io>

49b12f18

M

fix links to Pivotal Network. · 558be9d8
由 mkiyama 提交于 5月 02, 2017

558be9d8

removing info for older, incompatible netbackup versions (#2337) · 26b50340

由 David Yozie 提交于 5月 02, 2017

* removing info for older, incompatible netbackup versions

* removing system requirements from backup topic; adding conditionalized reference to release notes for supported netbackup versions

26b50340

J
Fix libperl issues for icw on sles · 6b01df41
由 Jim Doty 提交于 5月 02, 2017
```
Signed-off-by: NTom Meyer <tmeyer@pivotal.io>
```
6b01df41

Fix documentation after pl/java removal · fe1e5cb3

由 Daniel Gustafsson 提交于 5月 02, 2017

We don't have pl/java in gpAux/extensions anymore and there are no
proprietary modules either.

[ci skip]

fe1e5cb3

K

Removing codegen from pipeline. · 607e9e09
由 Karthikeyan Jambu Rajaraman 提交于 5月 01, 2017

607e9e09

gpperfmon: add behave test for diskspace history (#2339) · a18628a2

由 Larry Hamel 提交于 5月 02, 2017

* add behave test for diskspace_history
* remove commented-out test (moved to new tracker story to redo)
* refactor test for qamode
Signed-off-by: NC.J. Jameson <cjameson@pivotal.io>

a18628a2

L
gpperfmon: Add behave test for log_alert_history · b5bc853d
由 Larry Hamel 提交于 5月 01, 2017
```
Signed-off-by: NMarbin Tan <mtan@pivotal.io>
Signed-off-by: NC.J. Jameson <cjameson@pivotal.io>
```
b5bc853d
S

Track the number of HashAgg expansions. · 771854a7
由 Shreedhar Hardikar 提交于 4月 28, 2017

771854a7

Add compile clients and loaders into master and pr pipeline · a8c384e2

由 Jingyi Mei 提交于 5月 01, 2017

We did the following changes:
1. changed make files to reference sles instead of suse, and use
sles11-x86_64 instead of sles11_x86_64 in the zip artifact
2. changed make files to use rhel6/7-x86_64 instead of RHEL6/7-x86_64 in
the zip artifact
3. changed set_bld_arch.sh to set BLD_ARCH=sles instead of suse
4. in dependencies ivy file, add sles11_x86_64 as configurations to
multiple repos which only had suseXX-x86_64 before
5. set multiple config path/flags for sles11_x86_64 in Makefile which
didn't exist before
Signed-off-by: NTom Meyer <tmeyer@pivotal.io>
Signed-off-by: NKris Macoskey <kmacoskey@pivotal.io>
Signed-off-by: NJingyi Mei <jmei@pivotal.io>

a8c384e2

D

removing docs (currently not used) for non-oss datadirect drivers (#2347) · 29d155bb
由 David Yozie 提交于 5月 02, 2017

29d155bb

New reference pages from Postgres 8.3 (#2282) · a3a74f85

由 Jane Beckman 提交于 5月 02, 2017

* Add new OPERATOR_FAMILY pages

* Updates for Postgres 8.3

* Corrections to nav map

* Correct xref pointer

* xml format fix

* Standardize xml for new xrefs

* Remove duplicate link

* Change ref to Greenplum Database

* Updates for new PostreSQL commands ALTER/CREATE/DROP OPERATOR FAMILY, ALTER VIEW, and DISCARD.

* Updates from Heikki

* Remove 8.3 comparison

* Update GET DIAGNOSTICS

a3a74f85

02 5月, 2017 5 次提交

Remove dead code from catalog.py · 2c8a44fc

由 Daniel Gustafsson 提交于 5月 02, 2017

While looking at other things, spotted that there were quite a few
functions in catalog.py that were unused. Remove these, and also
remove imports of gppylib.db.catalog when unused.

2c8a44fc

gpperfmon: retain line breaks in query logging · f2513c19

由 C.J. Jameson 提交于 5月 01, 2017

- We think that external tables previously broke with newlines. This is
  no longer the case, so removing space replacement logic.
Signed-off-by: NMarbin Tan <mtan@pivotal.io>
Signed-off-by: NLarry Hamel <lhamel@pivotal.io>

f2513c19

Fixing storage tests with ORCA turned ON[#138725549] · fd45358c

由 Ekta Khanna and Jimmy Yih 提交于 4月 27, 2017

For appendonly_read_check, setting `optimizer_disable_missing_stats_collection=on`
to hide the additional NOTICES/HINTS produced by ORCA.

For uao/uaocs crash on update tests, we are setting `optimizer=off` as ORCA does
not go through the code path intended for these tests.

fd45358c

J
Fix resource leaks in gp_dump by freeing variables when they go out of scope · c98b7eff
由 Jamie McAtamney 提交于 4月 26, 2017
```
Authors: Karen Huddleston, Jamie McAtamney
```
c98b7eff
D

removing pgcrypto fips info from best practices guide (#2335) · 9f6727b0
由 David Yozie 提交于 5月 01, 2017

9f6727b0

01 5月, 2017 3 次提交

Ensure reader gang for subtransaction test is on segments [#138725549] · 630d6e85

由 Ekta Khanna, Jesse Zhang and Xin Zhang 提交于 4月 26, 2017

The sub-transaction tests are injecting faults at a point that only
happens in a reader gang on segments. The original query, when planned
by ORCA, would lead to a plan where the reader slice executes on master.
This commit changes the test (while mostly staying true to its spirits):
instead of using `INSERT INTO ... VALUES ...` we put the value in a
temporary table, and do an `INSERT INTO ... SELECT ... FROM`. The new
test should now pass under both ORCA and planner.

630d6e85

M

fix invalid dita. · 271fe22d
由 mkiyama 提交于 5月 01, 2017

271fe22d
V

Clean up gp_optimizer tests. Remove extra dashes · 6296f43f
由 Venkatesh Raghavan 提交于 4月 30, 2017

6296f43f

29 4月, 2017 6 次提交

fix unstable cases when using ORCA optimizer (#2326) · b619afa2

由 Eric Wu 提交于 4月 29, 2017

*polymorphims -- ORCA will not add order by clause in derived table,
thus we add 'limit 3' to force it add SORT node to make sure test result
is consistent.

*alter_table_aocs -- fix the input data to make order by gets stable
result

b619afa2

Remove SQL from walrep tests that does nothing [#138725549] · 95bbf84e

由 Jimmy Yih and Jesse Zhang 提交于 4月 28, 2017

This particular delete line fails to do anything because the subquery
returns nothing. Remove the line since it does not do anything for the
walrep tests.

95bbf84e

PR Pipeline: test gpperfmon on all pull requests · 13e6686b

由 C.J. Jameson 提交于 4月 28, 2017

- gpperfmon tracks a variety of queries and statistics and system
  properties and metrics in a dedicated database
- Until recently, it hadn't been tested or built very reliably here. It
  is now part of the gpdb_master pipeline.
- We are doing a lot of work to revitalize and provide tests for this
  feature; we plan to make many upcoming pull requests for this
  feature in different ways
- We will look into testing this with installcheck, but for now, it is
  tested with Behave
- Building it as part of PRs will help us get faster feedback
- This should add negligible time to compilation: the tests themselves
  run in less than 5 minutes typically, and will be running in parallel
  with ICW
Signed-off-by: NMarbin Tan <mtan@pivotal.io>

13e6686b

removing/correcting unnecessary references to 4.3 software; removing pgcrypto fips (#2316) · 5a7a878b

由 David Yozie 提交于 4月 28, 2017

* moving linux kerberos client instructions to dedicated topic, and placing it parallel with existing windows topic

* removing map and dita files for connectivity package, which is no longer provided

* removing more references to connectivity packages; conditionalizing postgres odbc/jdbc vs datadirect drivers for oss/pivotal audiences

* adding .ditaval for handling pivotal conditions; [ci skip]

* relocating pivotal ditaval

* adding .ditaval for handling pivotal conditions; [ci skip]

* relocating pivotal ditaval

* removing/fixing old references to 4.3 version

* removing pgcrypto.fips mentions

* small fix to version format

* removing 'orca' from package version strings

* changing rhel5 to rhel6 in example version strings

5a7a878b

M
gpperfmon: Add behave test for segment_history · 93778a25
由 Marbin Tan 提交于 4月 28, 2017
```
Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
```
93778a25
L
remove all emcconnect references · 865b55f7
由 Larry Hamel 提交于 4月 25, 2017
```
Signed-off-by: NMarbin Tan <mtan@pivotal.io>
```
865b55f7

28 4月, 2017 7 次提交

D

Remove spurious whitespace in gp_session_state · 980d08db
由 Daniel Gustafsson 提交于 4月 28, 2017

980d08db

Move gp_session_state to contrib/gp_internal_tools · 77a19ccb

由 Daniel Gustafsson 提交于 4月 28, 2017

src/bin was an incorrect location for gp_session_state, and while
contrib might not be ideal it's better than src/bin at least. Move
into the gp_internal_tools contrib module which contains various
Greenplum specific management functions and views. During the next
release cycle we should perhaps make this a proper backend function
and move it into gp_toolkit but for now lets collect these utils in
one place.

77a19ccb

Remove dead code in gp_internal_tools · 9a20f878

由 Daniel Gustafsson 提交于 4月 28, 2017

The gp_aoseg_history function was rolled into gp_ao_co_diagnostics
some time ago and has since been disconnected. The cache_stats
function in gp_workfile_mgr was simply not used and never exposed
such that a user could use it.

9a20f878

Move gp_workfile_mgr to internal_tools contrib · 367a096d

由 Daniel Gustafsson 提交于 4月 28, 2017

The gp_workfile_mgr functions are installed via gp_toolkit, so
being in src/bin is the wrong place for such code. Move to the
gp_internal_tools contrib, which also is the wrong place, but
at least has precedence for containing gp_toolkit functions.

These should at some point be made into backend functions and
moved from contrib but not at this point in cycle.

367a096d

D
Typos and spelling in workfile unittest code comments · 103aea0f
由 Daniel Gustafsson 提交于 4月 28, 2017
```
Minor copyediting, spelling fixes and style tweaks spotted while
glancing over the code.
```
103aea0f

Move workfile manager unittest code to src/test/regress · 537fba89

由 Daniel Gustafsson 提交于 4月 28, 2017

The workfile_manager unittest code was previously included in all
builds, this is however functionality which shouldn't be in prod
installations. This moves all unit test code to src/test/regress
and links it to the regress object such that it can be tested with
the normal pg_regress process.

Add a random test from the suite into ICW just to exercise it, none
of these test were ever executed in automated suites before so not
sure what we actually want to run.

537fba89

Up MASTER_MAX_CONNECT for ICG · c25008b3

由 Jesse Zhang 提交于 4月 27, 2017

Commit 35e0dfdb first bumped this up for
an isolation test.
Higher and higher concurrency has since crept into installcheck,
resulting in a flaky group that contains `mapred`. This hopefully
addresses that flake.

c25008b3