提交 · 165997a1779bc884c6f3b225a0bec1acdcde080b · Greenplum / Gpdb

20 6月, 2018 4 次提交

J

Update README after concourse domain change · 165997a1
由 Jim Doty 提交于 6月 19, 2018

165997a1

Add tests for subqueries nested inside a scalar expression · dd77c59c

由 Dhanashree Kashid 提交于 6月 19, 2018

Add tests to ensure sane behavior when a subquery appears nested inside
a scalar expression. The intent is to check for correct results.

Bump ORCA version to 2.63.0
Signed-off-by: NShreedhar Hardikar <shardikar@pivotal.io>

dd77c59c

Add pg_log directory to basebackup static exclusion list · 292ef134

由 Jimmy Yih 提交于 6月 18, 2018

The pg_log directory has always been excluded using the pg_basebackup
exclude option (-E ./pg_log). With this change, we add it to the
static list inside of basebackup. Because of this change, we are able
to remove all instances of mkdir pg_log in our management
utilities. Previously, the utilities would always have to create the
pg_log directory after running pg_basebackup because the postmaster
does a validation check on the pg_log path existing.

This also helps us align better with upstream Postgres since the
pg_basebackup exclude option is Greenplum-specific and really not
needed at all. Our dynamic exclusion list hasn't changed for a very
long time (so it's pretty much static anyways) and is not maintained
in the utilities very well. We may actually remove the pg_basebackup
exclude option in the near future.

292ef134

M

docs - fix gpload example. change table name desc to descr · b143cd71
由 mkiyama 提交于 6月 19, 2018

b143cd71

19 6月, 2018 9 次提交

docs - docs and updates for pgbouncer 1.8.1 (#5151) · a99194e0

由 Lisa Owen 提交于 6月 19, 2018

* docs - docs and updates for pgbouncer 1.8.1

* some edits requested by david

* add pgbouncer config page to see also, include directive

* add auth_hba_type config param

* ldap - add info to migrating section, remove ldap passwds

* remove ldap note

a99194e0

Update utilities to capture hyperloglog counter · aa5fe3d5

由 Omer Arap 提交于 4月 17, 2018

This commit updates the GPSD utility to capture the value of the column
`stainherit` and also the HLL counters stored in column `stavalues4`
generated for sample/full table scan based HLL analyze in `pg_statistic`
table.

This commit also updates minirepro utility to capture hyperloglog
counter
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

aa5fe3d5

Utilize hyperloglog and merge utilities to derive root table statistics · 9c1b1ae3

由 Omer Arap 提交于 1月 12, 2018

This commit introduces an end-to-end scalable solution to generate
statistics of the root partitions. This is done by merging the
statistics of leaf partition tables to generate the statistics of the
root partition. Therefore, ability to merge leaf table statistics for
the root table makes analyze very incremental and stable.

**CHANGES IN LEAF TABLE STATS COLLECTION:**

Incremental analyze will create sample for each partition as the
previous version. While analyzing the sample and generating statistics
for the partition, it will also create a `hyperloglog_counter` data
structure and add values from the sample to the `hyperloglog_counter`
such as number of multiples and sample size. Once the entire sample is
processed, analyze will save the `hyperloglog_counter` as a byte array
in `pg_statistic` catalog table. We reserve a slot for the
`hyperlog_counter` in the table and signify this as a specific type of
statistic kind which is `STATISTIC_KIND_HLL`. We only keep the
`hyperloglog_counter` in the `pg_catalog` for the leaf partitions. If
the user chooses to run FULL scan for HLL, we signify the kind as
`STATISTIC_KIND_FULLHLL`.

**MERGING LEAF STATISTICS**

Once all the leaf partitions are analyzed, we analyze the root
partition. Initially, we check if all the partitions have been analyzed
properly and have all the statistics available to us in the
`pg_statistic` catalog table. If there is a partition with no tuples,
even though it has no entry in `pg_catalog`, we consider it as analyzed.
If for some reason a single partition is not analyzed, we fall back to
the original analyze algorithm that requires to acquire sample for the
root partition and calculate statistic based on the sample.

Merging null fraction and average width from leaf partition statistics
is trivial and does not involve significant challenge. We do calculate
them first. Then, the remaining statistics information are:

- Number of distinct values (NDV)

- Most common values (MCV), and their frequencies termed as most common
frequency (MCF)

- Histograms that represent the distribution of the data values in the
table

**Merging NDV:**

Hyperloglog provides a functionality to merge multiple
`hyperloglog_counter`s into one and calculate the number of distinct
values using the aggregated `hyperlog_counter`. This aggregated
`hyperlog_counter` is sufficient only if the user chooses to run full
scan for hyperloglog. In the sample based approach, without the
hyperloglog algorithm, derivation of number of distinct values is not
possible. Hyperloglog enables us to merge the `hyperloglog_counter`s
from each partition and calculate the NDV on the merged
`hyperloglog_counter` with an acceptable error rate. However, it does
not give us the ultimate NDV of the root partition, it provides us the
NDV of the union of the samples from each partition.

The rest of the NDV interpolation depends on four metrics in postgres
and based on the formula used in postgres: NDV in the sample, number of
multiple values in the sample, sample size and total rows in the table.
Using these values the algorithm calculates the approximate NDV for the
table. While merging the statistics from the leaf partitions, with the
help of hyperloglog we can accurately generate NDV for the sample,
sample size and total rows, however, number of multiples in the
accumulated sample is unknown since we do not have an access to the
accumulated sample at this point.

_Number of Multiples_

Our approach to estimate the number of multiples in the aggregated
sample (which itself is unavailable) for the root requires the
availability of NDVs, number of multiples and size of each leaf sample.
The NDVs in each sample is trivial to calculate using the partition's
`hyperloglog_counter`. The number of multiples and sample size for each
partition is saved in the `hyperloglog_counter` of the partition to be
used in the merge during the leaf statistics gathering.

Estimating the number of multiples in the aggregate sample for the root
partition is a two step process. First, we accurately estimate the
number of values that reside in more than one partition's sample. Then,
we estimate the number of multiples that uniquely exists in a single
partition. Finally, we add these values to estimate the overall number
of multiples in the aggregate sample of the root partition.

To count the number of values that uniquely exists in one single
partition, we utilize hyperloglog functionality. We can easily estimate
how many values appear only on a specific partition _i_. We call the NDV
of overall aggregate of the entire partition as `NDV_all` and NDV of
aggregate of all partitions but _i_ as `NDV_minus_i`. The difference of
`NDV_all` and  `NDV_minus_i` would result in the values that appear in
only one partition. The rest of the values will contribute to the
overall number of multiples in the root’s aggregated sample, and we call
them as `nMultiple_inter` as the number of values that appear in more
than one partition.

However, that is not enough since even a single value only resides in
one partition, the partition might have multiple of them. We need a way
to express the possibility of existence of these values. Remember that
we also account the number of multiples that uniquely in partition
sample. We already know the number of multiples inside a partition
sample, however we need to normalize this value with the proportion of
the number of values unique to the partition sample to the number of
distinct values of the partition sample. The normalized value would be
partition sample i’s contribution to the overall calculation of the
nMultiple.

Finally, `nMultiple_root` would be the sum of the `nMultiple_inter` and
`normalized_m_i` for each partition sample.

**Merging MCVs:**

We utilize the merge functionality we imported from the 4.3 version of
the greenplum DB. The algorithm is trivial. We convert each MCV’s
frequency into count and add them up if they appear in more than one
partition. After every possible candidate’s count has been calculated,
we sort the candidate values and pick the top ones which is defined by
the `default_statistics_target`. 4.3 previously blindly picks the top
values with the highest count. We however incorporated the same logic
used in the current greenplum and postgres and test if a values is a
real MCV by running some tests. Therefore, even after the merge, the
logic totally aligns with the postgres.

**Merging Histograms:**

One of the main novel contribution of this commit comes in how we merge
the histograms from the leaf partitions. In 4.3 we use priority queue to
merge the histogram from the leaf partition. However, that approach is
very naive and loses very important statistical information. In
postgres, histogram is calculated over the values that did not qualify
as an MCV. The merge logic for the histograms in 4.3, did not take this
into consideration and significant statistical information is lost while
we merge the MCV values.

We introduce a novel approach to feed the MCV’s from the leaf partitions
that did not qualify as a root MCV to the histogram merge logic. To
fully utilize the previously implemented priority queue logic, we
treated non-qualified MCV’s as the histograms of a so called `dummy`
partitions. To be more previcate, if an MCV m1 is a non-qualified MCV we
create a histogram [m1, m1] where it only has one bucket and the bucket
size is the count of this non-qualified MCV. When we merge the
histograms of the leaf partitions and these dummy partitions the merged
histogram would not lose any statistical information.
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

9c1b1ae3

Import and modify analyze utility functions from 4.3 · 7ea27fc5

由 Omer Arap 提交于 1月 12, 2018

In the previous generation of analyze, gpdb provided features to merge
statistics such as MCVs (Most common values) and histograms for the root
or midlevel partitions from the leaf partition's statistics.

This commit imports the utility functions for merging MCVs and
histograms and modifies based on the needs of current version.
Signed-off-by: NBhunvesh Chaudhary <bchaudhary@pivotal.io>

7ea27fc5

O

Import hyperloglog as a utility to greenplum from conversant · 95279b10
由 Omer Arap 提交于 1月 08, 2018

95279b10

Port hyperloglog extension into gpdb · a9301fdc

由 Abhijit Subramanya 提交于 12月 18, 2017

- Port the hyperloglog extension into the contrib directory and make
corresponding makefile changes to get it to compile.
- Also modify initdb to install the HLL extension as part of gpinitsystem.
Signed-off-by: NOmer Arap <oarap@pivotal.io>
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

a9301fdc

A
Fix COPY TO ON SEGMENT processed counting · cb63e543
由 Adam Lee 提交于 6月 13, 2018
```
The processed variable should not be reset while looping all partitions.
```
cb63e543

Fix COPY TO IGNORE EXTERNAL PARTITIONS · f118f4bd

由 Adam Lee 提交于 6月 12, 2018

BeginCopy() returns a brand new CopyState but ignored the value of
skip_ext_partition, set after it.

It's a simple boolean of struct CopyStmt, no need to wrap in options.

f118f4bd

A
Update .gitignore files · e0e8f475
由 Adam Lee 提交于 6月 12, 2018
```
To have a clean `git status` output.
```
e0e8f475

18 6月, 2018 1 次提交

docs - gpbackup/gprestore new functionality. (#5157) · 6df82183

由 Mel Kiyama 提交于 6月 18, 2018

* docs - gpbackup/gprestore new functionality.

--gpbackup new option --jobs to backup tables in parallel.
--gprestore  --include-table* options support restoring views and sequences.

* docs - gpbackup/gprestore. fixed typos. Updated backup/restore of sequences and views

* docs - gpbackup/gprestore - clarified information on dependent objects.

* docs - gpbackup/gprestore - updated information on locking/quiescent state.

* docs - gpbackup/gprestore - clarify connection in --jobs option.

6df82183

16 6月, 2018 1 次提交

Fix incorrect modification of storageAttributes.compress. · 7c82d50f

由 Ashwin Agrawal 提交于 6月 14, 2018

For CO table, storageAttributes.compress only conveys if should apply block
compression or not. RLE is performed as stream compression within the block and
hence storageAttributes.compress true or false doesn't relate to rle at all. So,
with rle_type compression storageAttributes.compress is true for compression
levels > 1 where along with stream compression, block compression is
performed. For compress level = 1 storageAttributes.compress is always false as
no block compression is applied. Now since rle doesn't relate to
storageAttributes.compress there is no reason to touch the same based on
rle_type compression.

Also, the problem manifests more due the fact in datumstream layer
AppendOnlyStorageAttributes in DatumStreamWrite (`acc->ao_attr.compress`) is
used to decide block type whereas in cdb storage layer functions
AppendOnlyStorageAttributes from AppendOnlyStorageWrite
(`idesc->ds[i]->ao_write->storageAttributes.compress`) is used. Due to this
difference changing just one that too unnecessarily is bound to cause issue
during insert.

So, removing the unnecessary and incorrect update to
AppendOnlyStorageAttributes.

Test case showcases the failing scenario without the patch.

7c82d50f

15 6月, 2018 2 次提交

Rewrite circular buffer as a Python list (#5132) · 42a7c0dc

由 Divya Bhargov 提交于 6月 14, 2018

* Rewrite circular buffer as a Python list

Since we end up returning a List object, we may as well keep is as
a List object from the start.
Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
Co-authored-by: NDivya Bhargov <dbhargov@pivotal.io>

42a7c0dc

docs - describe the new resource group CPUSET feature (#5100) · 6a07ecf0

由 Lisa Owen 提交于 6月 14, 2018

* docs - resource group cpuset feature

* alter and create resource group sgml ref page updates

* gp_resource_group_cpu_limit applies to both CPU alloc modes

* add cpuset usage considerations

* restore ... fail, not backup

* misc edits, move note

6a07ecf0

14 6月, 2018 4 次提交

M
Fixed EXTERNAL WEB TABLE crashed when ON MASTER without LOG ERRORS (#5153) · cf316e0a
由 Ming LI 提交于 6月 14, 2018
```
 The hard-coded flag is not correct for all cases.
```
cf316e0a

Move gpaddmirrors' standby and recover TINC tests to behave · 57b0f2be

由 Nadeem Ghani 提交于 6月 11, 2018

- Add mirrors with and without standby, and ensure that the host
  assignment is identical between the two.
- Add mirrors, then kill one, and ensure that gprecoverseg operates
  correctly on the newly added mirror.
Co-authored-by: NNadeem Ghani <nghani@pivotal.io>
Co-authored-by: NJacob Champion <pchampion@pivotal.io>

57b0f2be

M

docs - gpaddmirrors - fix use of -s option. (#5109) · fe727bc0
由 Mel Kiyama 提交于 6月 13, 2018

fe727bc0

docs - update GUC optimizer_analyze_root_partition (#5102) · 2c7ef2c0

由 Mel Kiyama 提交于 6月 13, 2018

* docs - update GUC optimizer_analyze_root_partition

-change default to on
-update description

* docs - optimizer_analyze_root_partition, fix typo

2c7ef2c0

13 6月, 2018 1 次提交

Numeric short support for cdbHash · 76f3ff79

由 Omer Arap 提交于 6月 06, 2018

No hash was created for the new numeric format when it is a
`NumericShort`. This commit resolves the issue.

76f3ff79

12 6月, 2018 8 次提交

ci: nightly-trigger only on AIX jobs · c75f6a6d

由 Jim Doty 提交于 6月 11, 2018

For a while there were several jobs that were behind the nightly
trigger.  This necessitated some logic about including the
nightly-trigger resource if any of a number of conditions were met.  At
the time of this commit, the only job that is using the resourse is an
AIX job.  Therefore the inclusion of the nightly-trigger resource will
match the conditions that include the only job that requires that
resource.  This elimnates the "resource not used" error that can be seen
when setting a development version of the pipeline that does not include
the AIX job.
Authored-by: NJim Doty <jdoty@pivotal.io>

c75f6a6d

D

Docs: Fix typo postgres.conf -> postgresql.conf · 48727b64
由 David Yozie 提交于 6月 11, 2018

48727b64

Add more files to gitignore · fe69bd9f

由 Jim Doty 提交于 6月 11, 2018

When cloning a fresh copy of GPDB, running through the documented make
process, and then running the make target for the demo cluster, there
are three files that get generated. This commit adds those files to the
.gitignore files in their respective directories.
Authored-by: NJim Doty <jdoty@pivotal.io>

fe69bd9f

docs - update GUC gp_ignore_error_table (#5135) · e9047a6f

由 Mel Kiyama 提交于 6月 11, 2018

* docs - update GUC gp_ignore_error_table
-change set classification from system to session
-clarify INTO ERROR TABLE clause is not used.

* docs - update GUC gp_ignore_error_table - minor edits

e9047a6f

Add ServerAliveInterval=60 option to ssh. · 377ba05d

由 Shoaib Lari 提交于 6月 04, 2018

For long running commands such as gpinitstandby with a large master data
directory, the server takes a long time. Therefore, there is no acitivity from
the client to the server. If the ClientAliveInterval is set, then the server
reports a timeout after ClientAliveInterval seconds.

Setting a ServerAliveInterval value less than the ClientAliveInterval interval
forces the client to send a Null message to the server.  Hence, avoiding the
timeout.
Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>
Co-authored-by: NShoaib Lari <slari@pivotal.io>

377ba05d

M
docs - gpbackup ddboost plugin - add replication feature (#5127) · 86497351
由 Mel Kiyama 提交于 6月 11, 2018
```
* docs - gpbackup ddboost plugin - add replication feature

* docs - gpbackup ddboost plugin - fix typos
```
86497351

CI: Add `bin_gpdb_[platform]_rc` as Release_Candidate job output · 2c3b571d

由 Alexandra Wang 提交于 6月 05, 2018

A gate job is added for Release Candidate to make sure that all the
release candidate jobs passed for gpdb_src and bin_gpdb for centos6,
centos7 and sles11 platform.

The Release_Candidate job verifies that the commit SHA of gpdb_src and
all the bin_gpdb resources are the same. If the versions don't match,
the job will fail.

The bin_gpdb_[platform]_rc resources are put to a stable builds bucket
so that they can be consumed by integration and components pipelines
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NKris Macoskey <kmacoskey@pivotal.io>
Co-authored-by: NTrevor Yacovone <tyacovone@pivotal.io>

2c3b571d

Add a test for gpaddmirrors spread mirrors case. · 40fe328c

由 Jamie McAtamney 提交于 5月 21, 2018

We have added a test case to verify that the mirror configuration generated by
gpaddmirrors with the `-s` option is indeed spread over different hosts for each
of the primaries.
Co-authored-by: NJim Doty <jdoty@pivotal.io>
Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>
Co-authored-by: NNadeem Ghani <nghani@pivotal.io>
Co-authored-by: NKevin Yeap <kyeap@pivotal.io>
Co-authored-by: NShoaib Lari <slari@pivotal.io>

40fe328c

11 6月, 2018 5 次提交

Support plpython3u in GPDB. · b0977a59

由 Hubert Zhang 提交于 6月 11, 2018

Follow src/pl/plpython/README.md to see how to build
 and use plpython3u on GPDB.
Co-authored-by: NYandong Yao <yyao@pivotal.io>

b0977a59

J

Update cpuset error message to include more helpful hints (#5124) · f85c426b
由 Jialun 提交于 6月 11, 2018

f85c426b

Fix gpperfmon queries_history is showing 0 values for rows_out column · 135e24ba

由 Violet Cheng 提交于 6月 08, 2018

Gpperfmon table rows_out queries_history shows zero values under column "rows_out",
even though they returned several rows as output.

This fix will decrease the possibility of occurance of this bug.
But it is still possible due to gpperfmon harvest mode.

135e24ba

Fix external table with non-UTF8 encoding data · 6822104f

由 Adam Lee 提交于 6月 08, 2018

1, pass external table encoding to copy's options, then set
cstate->file_encoding to it, for reading and writing.

2, after the merge, copy state doesn't have a member of client encoding,
which used to set to the target encoding, get the converted data as a
client, now passes the file encoding (from copy options) to convert
directly.

6822104f

Fix GPPC implicit function declaration · aba21235

由 Adam Lee 提交于 6月 08, 2018

gppc.c: In function ‘TFGetFuncExpr’:
gppc.c:1255:3: error: implicit declaration of function ‘exprType’ [-Werror=implicit-function-declaration]
   exprType(list_nth(fexpr->args, argno)) != typid)
   ^~~~~~~~

aba21235

09 6月, 2018 3 次提交
- A
  Add start_ignore and end_ignore around all gp_inject_fault loads (#5097) · fa35b73f
  由 Andreas Scherbaum 提交于 6月 09, 2018
```
* Add start_ignore and end_ignore around all gp_inject_fault loads
```
  fa35b73f
- A
  
  Remove drop table from partition* test to cut-down wasted secs. · 80b935dc
  由 Ashwin Agrawal 提交于 5月 23, 2018
  
  80b935dc
- L
  
  docs - correct one of the gppc examples (#5126) · 578a35ec
  由 Lisa Owen 提交于 6月 08, 2018
  
  578a35ec
08 6月, 2018 2 次提交

Update time zone data files to tzdata release 2018e. · f9c94e87

由 Tom Lane 提交于 5月 09, 2018

This commit pulls in the latest tzdata from Postgres 11. We
intentionally left out comment changes to
`src/backend/utils/adt/datetime.c` because it's not applicable (yet).

> DST law changes in North Korea.  Redefinition of "daylight savings" in
> Ireland, as well as for some past years in Namibia and Czechoslovakia.
> Additional historical corrections for Czechoslovakia.
>
> With this change, the IANA database models Irish timekeeping as following
> "standard time" in summer, and "daylight savings" in winter, so that the
> daylight savings offset is one hour behind standard time not one hour
> ahead.  This does not change their UTC offset (+1:00 in summer, 0:00 in
> winter) nor their timezone abbreviations (IST in summer, GMT in winter),
> though now "IST" is more correctly read as "Irish Standard Time" not "Irish
> Summer Time".  However, the "is_dst" column in the pg_timezone_names view
> will now be true in winter and false in summer for the Europe/Dublin zone.
>
> Similar changes were made for Namibia between 1994 and 2017, and for
> Czechoslovakia between 1946 and 1947.
>
> So far as I can find, no Postgres internal logic cares about which way
> tm_isdst is reported; in particular, since commit b2cbced9 we do not
> rely on it to decide how to interpret ambiguous timestamps during DST
> transitions.  So I don't think this change will affect any Postgres
> behavior other than the timezone-view outputs.
>
> Discussion: https://postgr.es/m/30996.1525445902@sss.pgh.pa.us
(cherry picked from commit 234bb985)
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>

f9c94e87

Sync our copy of the timezone library with IANA release tzcode2018e. · 20269256

由 Tom Lane 提交于 5月 04, 2018

The non-cosmetic changes involve teaching the "zic" tzdata compiler about
negative DST.  While I'm not currently intending that we start using
negative-DST data right away, it seems possible that somebody would try
to use our copy of zic with bleeding-edge IANA data.  So we'd better be
out in front of this change code-wise, even though it doesn't matter for
the data file we're shipping.

Discussion: https://postgr.es/m/30996.1525445902@sss.pgh.pa.us
(cherry picked from commit b45f6613)

20269256