提交 · f9d8eb1c24e8369055257abedde4bbc3dd6f2613 · Greenplum / Gpdb

23 7月, 2019 6 次提交

A

Remove pipeline that is not being used. · f9d8eb1c
由 Adam Berlin 提交于 7月 22, 2019

f9d8eb1c

Refactor and improve cdbpath_motion_for_join · e2731add

由 Zhenghua Lyu 提交于 3月 08, 2019

This commit refactor the function `cdbpath_motion_for_join` to make
it clear and generate better plan for some cases.

In distributed computing system, to gather distributed data into a
singleQE should always be the last choice. Previous code for general
and segmentgeneral, when they are not ok_to_replicate, will try to
gather other locus to singleQD. This commit improves this by firstly
trying to add redistributed motion.

The logic for the join result's locus (outer's locus is general):
  1. if outer is ok to replicated, then result's locus is the same
     as inner's locus
  2. if outer is not ok to replicated (like left join or wts cases)
     2.1 if inner's locus is hashed or hashOJ, we try to redistribute
         outer as the inner, if fails, make inner singleQE
     2.2 if inner's locus is strewn, we try to redistribute
         outer and inner, if fails, make inner singleQE
     2.3 just return the inner's locus, no motion is needed

The logic for the join results' locus (outer's locus is segmentgenral):
- if both are SegmentGeneral:
     1. if both locus are equal, no motion needed, simply return
     2. For update cases. If resultrelation
        is SegmentGeneral, the update must execute
        on each segment of the resultrelation, if resultrelation's
        numsegments is larger, the only solution is to broadcast
        other
     3. no motion is needed, change both numsegments to common
       - if only one of them is SegmentGeneral:
         1. consider update case, if resultrelation is SegmentGeneral,
            the only solution is to broadcast the other
         2. if other's locus is singleQE or entry, make SegmentGeneral
            to other's locus
         3. the remaining possibility of other's locus is partitioned
            3.1 if SegmentGeneral is not ok_to_replicate, try to
                add redistribute motion, if fails gather each to
                singleQE
            3.2 if SegmentGeneral's numsegments is larger, just return
                other's locus
            3.3 try to add redistribute motion, if fails, gather each
                to singleQE

e2731add

Remove Replicated Locus in cdbpath_motion_for_join · 20248a31

由 Zhenghua Lyu 提交于 3月 11, 2019

Locus type Replicated can only be generated by join operation.
And in the function cdbpathlocus_join there is a rule:
    `<any locus type> join <Replicated> => any locus type`

Proof by contradiction, it shows that when code arrives here,
it is impossible that any of the two input paths' locus
is Replicated. So we add two asserts here.

20248a31

Re-enable `COPY (query) TO` on utility mode · 41a8cf29

由 Adam Lee 提交于 7月 19, 2019

It was disabled by accident several months ago while implementing
`COPY (query) TO ON SEGMENT`, re-enable it.

```
commit bad6cebc
Author: Jinbao Chen <jinchen@pivotal.io>
Date:   Tue Nov 13 12:37:13 2018 +0800

    Support 'copy (select statement) to file on segment' (#6077)
```

WARNING: there are no safety protections on utility mode, it's not
recommended except disaster recovery situation.
Co-authored-by: NWeinan WANG <wewang@pivotal.io>

41a8cf29

Behave CLI tests: use n1-standard-2 for ccp clusters · d26a2d65

由 David Krieger 提交于 7月 19, 2019

The recent sysctl changes(42930ed1) modified the ccp nodes.
Somehow, this causes memory issues on our ccp nodes for
Behave.  There was a recent, similar modification for
gpexpand(6f494638).

d26a2d65

Rid resource group on hashagg spill evaluation (#8199) · 40d955d6

由 Weinan WANG 提交于 7月 23, 2019

Resource group believe memory access speed always faster than disk,
and it adds hashagg executor node spill mechanism into its memory management.
If the hash table size overwhelms `max_mem`, in resource group model, the hash
table does not spill and fan out data. Resource group wants to grant more memory
for the hash table. However, this strategy impact hash collision rate, so that
some performance regression in some OLAP query.

We rid resource group guc when hashagg evaluate if it needs to spill.
Co-authored-by: NAdam Li <ali@pivotal.io>

40d955d6

22 7月, 2019 8 次提交

A
Fix memory quota calculation of aggregation · 79c83ea1
由 Adam Lee 提交于 7月 17, 2019
```
MemoryAccounting_RequestQuotaIncrease() returns a number in bytes, but
here expects kB.
```
79c83ea1

Fix compile issue for PyGreSQL on MacOS. · 0acc22e1

由 tubocurarine 提交于 7月 20, 2019

When building file `_pg.so` on MacOS platform, distutil will invoke
clang compiler with arguments `-arch x86_64 -arch i386`. But type
`int128` is not available for i386 architecture, thus following error occurs:

```
In file included from pgmodule.c:32:
In file included from include/postgres.h:47:
include/c.h:427:9: error: __int128 is not supported on this target
typedef PG_INT128_TYPE int128
        ^
include/pg_config.h:838:24: note: expanded from macro 'PG_INT128_TYPE'
                       ^
In file included from pgmodule.c:32:
In file included from include/postgres.h:47:
include/c.h:433:18: error: __int128 is not supported on this target
```

By adding `['-arch', 'x86_64']` into `extra_compile_args`, distutil will
remove `-arch i386` from compiler arguments, thus fixes compile error.

0acc22e1

Refactor to remove raw_buf_done from external scan · 585e90b6

由 Adam Lee 提交于 6月 14, 2019

scan->raw_buf_done was used for custom external table only, refactor
to remove the MERGE_FIXME.

cstate->raw_buf_len is safe to use since we operate pstate->raw_buf
directly in this case.

585e90b6

Remove some unnecessary MERGE_FIXMEs · b6afd44c

由 Adam Lee 提交于 6月 14, 2019

About the `isjoininner`, I searched the history commit in merge branch,
it was removed by "e2fa76d8 - Use parameterized paths to generate
inner indexscans more flexibly" on upstream from 9.2, that MERGE_FIXME
was there because at that time functions which rely on `isjoininner`
refused to compile.

b6afd44c

Expand sreh rejected count to int64 · 1f8254a8

由 Adam Lee 提交于 6月 14, 2019

If there are more than INT_MAX rejected rows, this will overflow. That
is possible at least if you specify the segment reject limit as a
percentage.

Still keep the SEGMENT REJECT LIMIT value as int, expanding that will
break lots of things like catalog but benefit too little.

1f8254a8

Place cdbsreh counting into copy function · 1425f036

由 Adam Lee 提交于 6月 14, 2019

Now the processed and rejected counting are in the NextCopyFrom() only,
which reads next tuple from file, makes much more sense.

1425f036

A
Implement macros Trap and TrapMacro for frontend · c6cb74ab
由 Adam Lee 提交于 6月 14, 2019
```
For future usage, and to remove a MERGE_FIXME.
```
c6cb74ab

Keep the order of reusing idle gangs · 51a7ea27

由 Ning Yu 提交于 7月 16, 2019

For example:
In the same session,
query 1 has 3 slices and it creates gang 1, gang 2 and gang 3.
query 2 has 2 slices, we hope it reuses gang 1 and gang 2 instead of other
cases like gang 3 and gang 2.

In this way, the two queries can have the same send-receive port pair. It's
useful in platform like Azure. Because Azure limits the number of different
send-receive port pairs (AKA flow) in a certain time period.
Co-authored-by: NHubert Zhang <hzhang@pivotal.io>
Co-authored-by: NPaul Guo <pguo@pivotal.io>
Co-authored-by: NNing Yu <nyu@pivotal.io>

51a7ea27

20 7月, 2019 2 次提交
- C
  docs - bring the pg_auth kerberos docs up-to-date with PG (#8085) · d5c26475
  由 Chuck Litzell 提交于 7月 19, 2019
```
* docs - bring the pg_auth kerberos docs up-to-date with postgres

* Edits from david

* Review comments - codeph on trust, ident, password auth method names
```
  d5c26475
- D
  
  Docs: Add note regarding changing ENCODING with gpload REUSE_TABLES · 90f89420
  由 David Yozie 提交于 7月 19, 2019
  
  90f89420
19 7月, 2019 1 次提交

doc: fix gpperfmon misspellings · 2bc67ff4

由 Daniel Gustafsson 提交于 7月 19, 2019

The correct name of the gpperfmon installation tool is gpperfmon_install
and the GUC for enabling it is gp_enable_gpperfmon.

2bc67ff4

18 7月, 2019 12 次提交

Remove remnants of ignore-header in atmsort/gpdiff · dd053277

由 Daniel Gustafsson 提交于 7月 18, 2019

Commit 3168a627 removed support for
ignoring table header whitespace differences in test output, but the
patch was a few bricks shy of a load. There were enough leftover bits
that the option could be invoked, but without it actually working.
This removes the leftovers.

Looking at this it became clear that we had a whitespace ignore which
was dead code, as it couldn't be triggered from the outside. Rather
than trying to revive more cruft in atmsort, this removes the code
since we clearly aren't using it.
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

dd053277

A

Address some minor comments. · 297b5618
由 Adam Berlin 提交于 7月 17, 2019

297b5618
A

Remove unused function. · 249a551a
由 Adam Berlin 提交于 7月 16, 2019

249a551a

Speed up tests by using xlog noop · b981d314

由 Adam Berlin 提交于 7月 16, 2019

- also, wait for primaries to recover after panics
- also, checkpoint at the end of each test to set redo point to not leak into next text

b981d314

Fix direct dispatch answer file to use one-phase commit message. · 3d7c411a

由 Hubert Zhang 提交于 7月 18, 2019

One phase commit message change to `Distributed Commit (one-phase)`
We need to fix the new case introduced by commit #6f9368
in direct disptach answer file.

3d7c411a

Fix flaky in query_info_hook_test when interconnect in TCP mode (#8163) · 31d8434a

由 Wang Hao 提交于 7月 18, 2019

The goal of query_info_hook_test is to ensure query_info_collect_hook
are placed in porper location for emitting query execution metrics.
This test was flaky due to uncertain order of calling between QD and
QEs when the interconnect in TCP mode.
This fix simply silent all QEs from emitting messages.
This is acceptable from the scope of this test because we just want
to make sure hooks are called at correct timing for each backend.
It should not be disturbed by query dispatching between QD and QEs.

31d8434a

Consider non direct dispatch cost for cached plan. · 6f936827

由 Hubert Zhang 提交于 6月 18, 2019

Prepare statement will bind parameters for each execution. It needs to decide to
use a cached generic plan without params or a custom plan with params. In past,
GPDB use plan cost plus re-plan cost to choose generic and custom plan. But
generic plan does not contain params which leads to it could not generate
direct dispatch plan compared with custom plan.

For non direct dispatch plan it will introduce unneccessary QEs, which still
need to go through volcano model, do two phase commit and write prepare xlog.
So the cost of failed to generate direct dispatch plan would be higher in some
case than the re-plan cost which makes custom plan runs faster than generic
plan even if it needs to re-plan for every execute.

Note that non direct dispatch cost is not considered in planner yet. Planner
treats direct dispatch as an optimization and always enable it when possible.
But for prepare statement, the case is that for generic plan it could not
generate direct dispatch plan at all. But we need to consider this cost here,
as a result, we introduce non direct dispatch cost into total cost only for
cached plans.
Co-authored-by: NNing Yu <nyu@pivotal.io>

6f936827

Bump ORCA version from 3.57.0 to 3.58.1 (#8170) · 34ca4a75

由 Chris Hajas 提交于 7月 17, 2019

3.58.0 corresponds to commit "Only create PropConstraint hashmap if
necessary"
3.58.1 corresponds to commit "Fix stack-use-after-scope for
`CCacheHashtableAccessor` instantiation"
Authored-by: NChris Hajas <chajas@pivotal.io>

34ca4a75

D

Docs: removing DCA defaults · bbd6a6d4
由 dyozie 提交于 7月 17, 2019

bbd6a6d4
D

Docs: Removing DCA references · 9662565d
由 dyozie 提交于 7月 17, 2019

9662565d

Add more entries to the mailmap · 6f06899b

由 Daniel Gustafsson 提交于 7月 17, 2019

Happened to stumble over a commit by Asim that didn't seem to use
the usual name, and sure enough. Also add a few others that I had
lying around awaiting more to make it worth committing.

6f06899b

Remove deletion of no longer existing files · e4edf981

由 Daniel Gustafsson 提交于 7月 17, 2019

The list of files to clean in test/regress contained references
to files no longer present.  pg_class32 was an intermediate file
in a test for upgrades from Greenplum 3.2 to 3.3/3.4; cppudf.sql
was added in an Orca testsuite commit which seems to have never
used that file at all; gmon.out is an output file generated by
gperf and all gperf invocations have been removed from tests.
Reviewed-by: NAsim R P <apraveen@pivotal.io>
Reviewed-by: NJimmy Yih <jyih@pivotal.io>

e4edf981

17 7月, 2019 7 次提交

Use the correct time unit in BackoffSweeper backend · 16522314

由 Pengzhou Tang 提交于 7月 17, 2019

Commit bfd1f46c used the wrong time unit (expect Ms, passed with Us)
in BackoffSweeper backend which makes it cannot re-calculate the CPU shares
in time and the normal backends will sleep more CPU ticks than before in
CHECK_FOR_INTERRUPTS and cause a performance downgrade.

16522314

Wait for a command to show up in pg_stat_activity · 828086fa

由 Asim R P 提交于 7月 17, 2019

It takes non-zero amount of time after a command is dispatched from a
client until it appears in pg_stat_activity. The test must wait
before validating anything based on pg_stat_activity. The wait logic
was already added for one instance of such validation. This patch
addres the wait logic for the remaining instance of validation.

Also found a way to avoid creating one table, while at it.

Reviewed-by: Shaoqi Bai and Adam Berlin

828086fa

Fix perfsummary.py script to recognize new optimizer name in EXPLAIN (#8152) · ad80a1a7

由 Hans Zeller 提交于 7月 16, 2019

This tool looks at EXPLAIN plans and recognizes the line with the
optimizer version. Recently, we added the string "(GPORCA)" to the
optimizer name.

The fix is to add parentheses to the characters we ignore in this line.

ad80a1a7

Adding a typedef for a type referenced in dead code (#8151) · 59b437e0

由 Hans Zeller 提交于 7月 16, 2019

In my earlier PR #8149 I removed some typedefs using the CSpinlockOS
class that no longer exists. However, one of those typedefs, ConnectionHT
was still used in the class. It did not cause any compilation errors,
probably because this class is unused.

To be consistent and to make the code easier to read, this PR adds back
the missing typedef. In the long term we could consider removing the
entire class COptServer.

59b437e0

D

Add conceptual info for MADlib deep learning (#8055) · 3f976f60
由 David Yozie 提交于 7月 16, 2019

3f976f60

Bump ORCA version from 3.55 to 3.57 (#8149) · 7c962aa6

由 Hans Zeller 提交于 7月 16, 2019

* remove reference to deleted header file
* Remove more references to CSpinlock
* Bump ORCA version to 3.56 for PR503
* Bump ORCA version to 3.57 for PR510

7c962aa6

docs - pxf jdbc partition range improvements (#8107) · adb5ea89

由 Lisa Owen 提交于 7月 16, 2019

* docs - pxf jdbc partition range improvements

* misc edits requested by david

* clarify use of RANGE and INTERVAL (hint), edits requested by francisco

adb5ea89

16 7月, 2019 4 次提交

gpexpand: improve timeout tests · 41646c8d

由 Ning Yu 提交于 7月 16, 2019

Some gpexpand behave tests verify the behaviors of the `--duration` and
`--end` arguments, they expect the gpexpand data redistribution phase to
exist after 2 seconds before all the redistribution are done.  However
internally gpexpand check for the timeout at a 5-second interval, if the
data redistribution completes within 5 seconds then the tests will fail.
This is just what is happening recently.

Improve these tests by queueing more tables for redistribution.

41646c8d

X

fix win curl tool link failure problem (#8142) · 1051346d
由 Xiaodong Huo 提交于 7月 16, 2019

1051346d

Add test cases for contrib/sslinfo and enable it (#8105) · 508bdb11

由 Hao Wu 提交于 7月 16, 2019

Add certificates & keys and test cases for contrib/sslinfo

Use `echo` + `sed` to add/remove options in postgresql.conf of
master node and standby node. This method could completely restore
the SSL related options in postgresql.conf. The imperfect point is
this way may overwrite the existing certificates and keys under data directory.

We use newly created certificates and keys, instead of certificates in src/test/ssl/ssl. Because there some fields in those certificates are missing in the test.
`sslinfo` will only be compiled and packaged when `--with-openssl` option is enabled in configuration, otherwise `sslinfo` is omitted.

508bdb11

gpexpand: Use larger instance_type for more memory · 6f494638

由 Nikolaos Kalampalikis 提交于 7月 15, 2019

After introducing sysctl settings in commit 42930ed1, gpexpand fails to
shutdown a segment with:
`'Shutdown failed! [Errno12] Cannot allocate memory'`

This is most likely due to the n1-standard-1 instance_type
having only 3.75 GB of memory, which is too small causing our python
utilities to fail when allocating multiple threads. This can be worked
around by setting the `-B` parallel_process option to a low value, or
setting the kernel parameter vm.overcommit_memory to 0. However, we
would like to test the standard sysctl settings as recommended to
customers, rather than supporting unrealistic edge cases such as
machines with very low memory.
Co-authored-by: NNikolaos Kalampalikis <nkalampalikis@pivotal.io>

6f494638