提交 · 54dbd92695589121ff0e97aa0182af754fc6ef19 · Greenplum / Gpdb

17 7月, 2020 2 次提交

docs - update utility docs with IP/hostname information. (#10379) · 54dbd926

由 Mel Kiyama 提交于 7月 16, 2020

* docs - update utility docs with IP/hostname information.

Add information to gpinitsystem, gpaddmirrors, and gpexpand ref. docs
--Information about using hostnames vs. IP addresses
--Information about configuring hosts that are configured with mulitple NICs

Also updated some examples in gpinitsystem

* docs - review comment updates. Add more information from dev.

* docs - change examples to show valid configurations that support failorver.
Also fix typos and minor edits.

* docs - updates based on review comments.

54dbd926

L

docs - greenplumr input.signature (#10477) · 1c294e95
由 Lisa Owen 提交于 7月 16, 2020

1c294e95

16 7月, 2020 3 次提交

W
Fix a delimiter bug if external table has delimiter 'OFF', and the value of the column is 'O'. · 96ee8430
由 Wen Lin 提交于 7月 16, 2020
```
Add a bool flag 'delim_off' for CopyStateData to indicate if delimiter is set to OFF or not.
```
96ee8430

docs - add information for SSL with standby master (#10438) · 581ef05c

由 Mel Kiyama 提交于 7月 15, 2020

* docs - add information for SSL with standby master

--SSL file should not be in $MASTER_DATA_DIRECTORY

Also
--Add not about not using NULL ciphers
--Correct default directory for SSL files to $MASTER_DATA_DIRECTORY

* docs - review comment updates

581ef05c

When dropping a partition, keep the lock until end of transaction. · 86ded366

由 Heikki Linnakangas 提交于 7月 15, 2020

That's how the locking works for all other tables, including inherited
tables, and for partitions in PostgreSQL v10 partitioning. But in GPDB,
we were intentionally releasing the lock too early, to save memory when
working on huge partition hierarchies I believe. But that's a lousy
tradeoff, we shouldn't skimp on safety just to save some memory. If
you run out of lock memory as a result, you need to bump up
max_locks_per_tranactions. Memory is cheap, and if you're working with
tens of thousands of partitions, you can afford reserving some memory for
the lock manager.

Fixes https://github.com/greenplum-db/gpdb/issues/5919Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
Reviewed-by: NAsim R P <pasim@vmware.com>

86ded366

15 7月, 2020 5 次提交

Remove deadcode contain_ctid_var_reference. · d229288a

由 Zhenghua Lyu 提交于 7月 15, 2020

It was used to implement dedup plan which has been
refactored by the commit 9628a332.

So in this commit we remove these unused functions.

d229288a

Fix flaky test case 'gpcopy' · 9480d631

由 Pengzhou Tang 提交于 7月 14, 2020

The failed test case is to test the command "copy lineitem to '/tmp/abort.csv'"
can be cancelled after COPY is dispatched to QEs. To verify this, it checks that
/tmp/abort.csv has fewer rows than lineitem.

The cancel logical in codes is:

QD dispatched the COPY command to QEs, then if QD get a cancel interrupt, it
sends a cancel request to QEs, however, the QD will keep receiving data from
QEs even QD already get a cancel interrupt. QD relies on QEs to receive the
cancel request and explicitly stop copying data to QD.

Obviously, QEs may already have copied out all data to QDs before they
get cancel requests, so the test case cannot guarantee /tmp/aborted.csv
has fewer rows than lineitem.

To fix this, we just verify the COPY command can be aborted with message
'ERROR:  canceling statement due to user request', the count
verification looks pointless here.

9480d631

Cleanup idle reader gang after utility statements · d1ba4da5

由 Hubert Zhang 提交于 7月 15, 2020

Reader gangs use local snapshot to access catalog, as a result, it will
not synchronize with the sharedSnapshot from write gang which will
lead to inconsistent visibility of catalog table on idle reader gang.
Considering the case:

select * from t, t t1; -- create a reader gang.
begin;
create role r1;
set role r1;  -- set command will also dispatched to idle reader gang

When set role command dispatched to idle reader gang, reader gang
cannot see the new tuple t1 in catalog table pg_auth.
To fix this issue, we should drop the idle reader gangs after each
utility statement which may modify the catalog table.
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>

d1ba4da5

Correct plan of general & segmentGeneral path with volatiole functions. · d1f9b96b

由 Zhenghua Lyu 提交于 7月 15, 2020

General and segmentGeneral locus imply that if the corresponding slice
is executed in many different segments should provide the same result
data set. Thus, in some cases, General and segmentGeneral can be
treated like broadcast.

But what if the segmentGeneral and general locus path contain volatile
functions? volatile functions, by definition, do not guarantee results
of different invokes. So for such cases, they lose the property and
cannot be treated as *general. Previously, Greenplum planner
does not handle these cases correctly. Limit general or segmentgeneral
path also has such issue.

The fix idea of this commit is: when we find the pattern (a general or
segmentGeneral locus paths contain volatile functions), we create a
motion path above it to turn its locus to singleQE and then create a
projection path. Then the core job becomes how we choose the places to
check:

1. For a single base rel, we should only check its restriction, this is
the at bottom of planner, this is at the function set_rel_pathlist
2. When creating a join path, if the join locus is general or segmentGeneral,
check its joinqual to see if it contains volatile functions
3. When handling subquery, we will invoke set_subquery_pathlist function,
at the end of this function, check the targetlist and havingQual
4. When creating limit path, the check and change algorithm should also be used
5. Correctly handle make_subplan

OrderBy clause and Group Clause should be included in targetlist and handled
by the above Step 3.

Also this commit fixes DMLs on replicated table. Update & Delete Statement on
a replicated table is special. These statements have to be dispatched to each
segment to execute. So if they contain volatile functions in their targetList
or where clause, we should reject such statements:

1. For targetList, we check it at the function create_motion_path_for_upddel
2. For where clause, they will be handled in the query planner and if we
find the pattern and want to fix it, do another check if we are updating
or deleting replicated table, if so reject the statement.
3. Upsert case is handled in transform stage.

d1f9b96b

Fix uninitialized variable in pgrowlocks · 75283bc7

由 Japin 提交于 7月 10, 2020

Because the variable rel is only used in if (SRF_IS_FIRSTCALL()) branch,
we should move it's declaration into this branch (suggested by Hubert Zhang).

75283bc7

14 7月, 2020 3 次提交

Used dictionary to form connection string in minirepro · 5c1a7269

由 Tyler Ramer 提交于 7月 10, 2020

The approach of using a connection string is fragile in the event of an
update to pygresql or missing values, as demonstrated by errors after
update of pygresql in f5758021

Instead, we'll use a named value as arguments to pgdb.connect(),
following the example of dbconn.py
Authored-by: NTyler Ramer <tramer@vmware.com>

5c1a7269

Add debugging code in shared snapshot code and tweak the shared snapshot code a bit. · ee2d4641

由 Paul Guo 提交于 7月 08, 2020

Notably we want the shared snapshot dumping information when encountering the
"snapshot collision" error, which was seen on real scenario and it is hard to
debug.

ee2d4641

Add debugging code for the "latch already owned" error. · 210d8b5a

由 Paul Guo 提交于 7月 08, 2020

We've seen such a case on a stable release but it is hard to debug via the
message only, so let's provide more details in the error message.

210d8b5a

13 7月, 2020 4 次提交

D

Docs - remove HCI warning · 9eb9c2ac
由 David Yozie 提交于 7月 13, 2020

9eb9c2ac

Update linux installation guide · ba5792fa

由 Tyler Ramer 提交于 7月 10, 2020

Issue #10069 noted some problems with the linux documentation.

Updating this documentation to be more accurate and direct configuration
steps to the appropriate documentation.
Co-authored-by: NTyler Ramer <tramer@vmware.com>
Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>

ba5792fa

Remove unused function pathnode_walk_node. · 7339a178

由 Zhenghua Lyu 提交于 7月 13, 2020

Previously, `cdbpath_dedup_fixup` is the only function that
will invoke `pathnode_walk_node`. And it was removed by the
commit 9628a332.

So in this commit we remove these unused functions.

7339a178

(

Fix flaky test for replication_keeps_crash. (#10423) · db60b003

由 (Jerome)Junfeng Yang 提交于 7月 13, 2020

Remove the set `gp_fts_probe_retries to 1` which may cause FTS probe failed.
This was first added to reduce the test time, but set a lower retry
value may cause the test failed to probe FTS update segment
configuration. Since reduce the `gp_fts_replication_attempt_count` also
save the test time, so skip alter ``gp_fts_probe_retries`.

Also find an assertion may not match when mark mirror down happens before
walsender exit, which will free the replication status before walsender
exit and try to record disconnect info. Which lead the segment crash
and starts recover.

db60b003

10 7月, 2020 11 次提交

ic-proxy: enable ic-proxy with --enable-ic-proxy · 81810a20

由 Ning Yu 提交于 6月 15, 2020

We used to use the option --with-libuv to enable ic-proxy, it is not
staightforward to understand the purpose of that option, though.  So we
renamed it to --enable-ic-proxy, and the default setting is changed to
"disable".

Suggested by Kris Macoskey <kmacoskey@pivotal.io>

81810a20

ic-proxy: let backends connect to the proxy bgworker · 94c9d996

由 Ning Yu 提交于 5月 18, 2020

Only in proxy mode, of course. Currently the ic-proxy mode shares most
of the backend logic with ic-tcp mode, so instead of copying the code we
actually embed the ic-proxy specific logic in ic_tcp.c .

94c9d996

N

ic-proxy: launch as a bgworker · 5b60069c
由 Ning Yu 提交于 5月 18, 2020

5b60069c
N
ic-proxy: new value "proxy" in GUC gp_interconnect_type · 245ca266
由 Ning Yu 提交于 5月 18, 2020
```
It is for the ic-proxy mode.
```
245ca266
N

ic-proxy: make gp_interconnect_proxy_addresses a GUC · 3140a44f
由 Ning Yu 提交于 5月 18, 2020

3140a44f

ic-proxy: implement the core logic · 6188fb1f

由 Ning Yu 提交于 5月 18, 2020

The interconnect proxy mode, a.k.a. ic-proxy, is a new interconnect
mode, all the backends communicate via a proxy bgworker, all the
backends on the same segment share the same proxy bgworker, so every two
segments only need one network connection between them, which reduces
the network flows as well the ports.

To enable the proxy mode we need to first configure the guc
gp_interconnect_proxy_addresses, for example:

    gpconfig \
      -c gp_interconnect_proxy_addresses \
      -v "'1:-1:10.0.0.1:2000,2:0:10.0.0.2:2001,3:1:10.0.0.3:2002'" \
      --skipvalidation

Then restart to take effect.

6188fb1f

Store dbid in CdbProcess · 8804bf39

由 Ning Yu 提交于 5月 18, 2020

It is a preparation for the ic-proxy mode, we need this information to
distinguish a primary segment with its mirror.

8804bf39

Fix pyyaml windows build (#10451) · 3daafd2f

由 Peifeng Qiu 提交于 7月 10, 2020

Local fork at gpMgmt/bin/ext/yaml was removed by 8d6c3059. Unpack
it from gpMgmt/bin/pythonSrc/ext just like pygresql.

3daafd2f

A
[Refactor] Pull out KHeap into CKHeap.h · 9e8f261d
由 Ashuka Xue 提交于 6月 22, 2020
```
Pull out the implementation for binary heap into its own templated h
file.
```
9e8f261d

Make histograms commutative when merging · 9b427611

由 Ashuka Xue 提交于 6月 22, 2020

Prior to this commit, merging two histograms was not commutative.
Meaning histogram1->Union(histogram2) could result in a row estimate of
1500 rows, but histogram2->Union(histogram1) could result in a row
estimate of 600 rows.

Now, MakeBucketMerged has been renamed to SplitAndMergeBuckets. This
function, which calculates the statistics for the merged bucket, now
consistently return the same histogram buckets regardless of the order
of input. This in turn, makes MakeUnionHistogramNormalize and
MakeUnionAllHistogramNormalize commutative.

Once we have successfully split the buckets and merged them as
necessary, we may have generated up to 3X the number of buckets that
were originally present. Thus we cap the number of buckets to be either
the max size of the two incoming buckets, or, 100 buckets.

CombineBuckets will then reduce the size of the histogram by combining
consecutive buckets that have similar information. It does this by using
a combination of two ratios: freq/ndv and freq/bucket_width. These two
ratios were decided based off the following examples:

Assuming that we calculate row counts for selections like the following:
- For a predicate col = const: rows * freq / NDVs
- For a predicate col < const: rows * (sum of full or fractional frequencies)

Example 1 (rows = 100), freq/width, ndvs/width and ndvs/freq are all the same:
  ```
  Bucket 1: [0, 4)   freq .2  NDVs 2  width 4  freq/width = .05 ndv/width = .5 freq/ndv = .1
  Bucket 2: [4, 12)  freq .4  NDVs 4  width 8  freq/width = .05 ndv/width = .5 freq/ndv = .1
  Combined: [0, 12)  freq .6  NDVs 6  width 12
  ```

This should give the same estimates for various predicates, with separate or combined buckets:
```
pred          separate buckets         combined bucket   result
-------       ---------------------    ---------------   -----------
col = 3  ==>  100 * .2 / 2           = 100 * .6 / 6    = 10 rows
col = 5  ==>  100 * .4 / 4           = 100 * .6 / 6    = 10 rows
col < 6  ==>  100 * (.2 + .25 * .4)  = 100 * .5 * .6   = 30 rows
```

Example 2 (rows = 100), freq and ndvs are the same, but width is different:
```
Bucket 1: [0, 4)   freq .4  NDVs 4  width 4  freq/width = .1 ndv/width = 1 freq/ndv = .1
Bucket 2: [4, 12)  freq .4  NDVs 4  width 8  freq/width = .05 ndv/width = .5 freq/ndv = .1
Combined: [0, 12)  freq .8  NDVs 8  width 12
```

This will give different estimates with the combined bucket, but only for non-equal preds:
```
pred          separate buckets         combined bucket   results
-------       ---------------------    ---------------   --------------
col = 3  ==>  100 * .4 / 4           = 100 * .8 / 8    = 10 rows
col = 5  ==>  100 * .4 / 4           = 100 * .8 / 8    = 10 rows
col < 6  ==>  100 * (.4 + .25 * .4) != 100 * .5 * .8     50 vs. 40 rows
```

Example 3 (rows = 100), now NDVs / freq is different:
```
Bucket 1: [0, 4)   freq .2  NDVs 4  width 4  freq/width = .05 ndv/width = 1 freq/ndv = .05
Bucket 2: [4, 12)  freq .4  NDVs 4  width 8  freq/width = .05 ndv/width = .5 freq/ndv = .1
Combined: [0, 12)  freq .6  NDVs 8  width 12
```

This will give different estimates with the combined bucket, but only for equal preds:
```
pred          separate buckets         combined bucket   results
-------       ---------------------    ---------------   ---------------
col = 3  ==>  100 * .2 / 4          != 100 * .6 / 8      5 vs. 7.5 rows
col = 5  ==>  100 * .4 / 4          != 100 * .8 / 8      10 vs. 7.5 rows
col < 6  ==>  100 * (.2 + .25 * .4)  = 100 * .5 * .6   = 30 rows
```

This commit also adds an attribute to the statsconfig for MaxStatsBuckets
and changes the scaling method when creating singleton buckets.

9b427611

[Refactor] Update MakeStatsFilter, Rename CreateHistMashMapAfterMergingDisjPreds -> · c14fbb92

由 Ashuka Xue 提交于 4月 16, 2020

MergeHistogramMapsforDisjPreds

This commit refactors MakeStatsFilter to use
MakeHistHashMapConjOrDisjFilter instead of individually calling
MakeHistHashMapConj and MakeHistHashMapDisj.

This commit also modifies MergeHistogramMapsForDisjPreds to avoid copy
and creating unnecessary histogram buckets.

c14fbb92

09 7月, 2020 4 次提交

Use yaml safe_load in gppkg · 4aa3b2a3

由 Tyler Ramer 提交于 7月 08, 2020

Commit 21a2cb27b38117cce90c4ff06d8d447842c5acf1, added in PR #10361,
updated yaml and changed yaml.load to yaml.safe_load in gpload.

gppkg uses yaml as well, but references were not updated - this commit
resolves that discrepancy.
Co-authored-by: NTyler Ramer <tramer@vmware.com>
Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>

4aa3b2a3

A

Alphabetically arrange the (un)sync_guc_name.h files · 93d8881e
由 Ashwin Agrawal 提交于 7月 08, 2020

93d8881e
F
Remove duplicated GUC names in unsync_guc_name.h · 91e4e3f7
由 Fang Zheng 提交于 7月 07, 2020
```
Fixes https://github.com/greenplum-db/gpdb/issues/10437
```
91e4e3f7

Fix colrefs getting mangled when merging equivalent classes in Orca · 3062b121

由 Chris Hajas 提交于 7月 07, 2020

Previously, the PdrgpcrsAddEquivClass function would modify the input
colref set. This does not appear intentional, as this same reference may
be accessed in other places. This caused Orca to fall back to planner in
some cases during translation with "Attribute number 0 not found in
project list".
Co-authored-by: Nmubo.fy <mubo.fy@alibaba-inc.com>
Co-authored-by: NChris Hajas <chajas@pivotal.io>
Co-authored-by: NHans Zeller <hzeller@vmware.com>

3062b121

08 7月, 2020 8 次提交

gpcheckcat: fix gpcheckcat vpinfo issue · 988d7c03

由 xiong-gang 提交于 7月 08, 2020

The entry in aocsseg table might be compacted and waiting for drop, so we
should use 'state' to filter the unused entry.

988d7c03

Fix pygresql windows build (#10420) · 49765579

由 Peifeng Qiu 提交于 7月 08, 2020

- CMakeLists.txt moved to gpMgmt/bin/pythonSrc/PyGreSQL
- Unpack source code from gpMgmt/bin/pythonSrc/ext/PyGreSQL-*.tar.gz
- Add declaration to force dllexport on init_pg
- Remove the pygresql level folder. All files are moved up.

49765579

gpcheckcat: add the check of vpinfo consistency · f2efbda3

由 xiong-gang 提交于 7月 08, 2020

column 'vpinfo' in pg_aoseg.pg_aocsseg_xxx record the 'eof' of each attribute
in the AOCS table. Add a new check 'aoseg_table' in gpcheckcat, it checks the
number of attributes in 'vpinfo' is the same as the number of attributes in
'pg_attribute'. This check is performed in parallel and independently on each
segment, and it checks aoseg table and pg_attribute in different transaction,
so it should be run 'offline' to avoid false alarm.

f2efbda3

Use separate make and make install in travis · 0f02a355

由 Tyler Ramer 提交于 6月 24, 2020

Travis will consume some of the output if make -s install is used
instead of separate make and make install steps.
Co-authored-by: NTyler Ramer <tramer@vmware.com>
Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>

0f02a355

Remove unused imports · 931b5077

由 Tyler Ramer 提交于 6月 23, 2020

Yaml was imported but unused in several locations.
gpMgmt/test/behave/mgmt_utils/steps/mgmt_utils.py had numerous unused
or duplicated imports.
Co-authored-by: NTyler Ramer <tramer@vmware.com>
Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>

931b5077

Remove unused yaml class in mainUtils · 949010f8

由 Tyler Ramer 提交于 6月 23, 2020

It seems this yaml class is dead code. Removing it for this reason.
Co-authored-by: NTyler Ramer <tramer@vmware.com>
Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>

949010f8

Update PyYAML to 5.3.1 · 8d6c3059

由 Tyler Ramer 提交于 6月 23, 2020

The version of PyYAML vendored in gpMgmt/bin/ext is old, unmaintained,
and does not support python3. Actually, it does not even contain a
`__version__` attribute, so it is not possible to know the version.

We need to unvendor YAML and get to a library version that supports
python3 - for this reason, we are updating to the latest PyYAML
available.

Also update yaml.load to use yaml.safe_load instead.
Co-authored-by: NTyler Ramer <tramer@vmware.com>
Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>

8d6c3059

L
docs - gphdfs2pxf migration pxf supports avro compression (#10415) · ab91fee5
由 Lisa Owen 提交于 7月 07, 2020
```
* docs - gphdfs2pxf migration pxf supports avro compression

* missing plural
```
ab91fee5