- 20 6月, 2018 4 次提交
-
-
由 Jim Doty 提交于
-
由 Dhanashree Kashid 提交于
Add tests to ensure sane behavior when a subquery appears nested inside a scalar expression. The intent is to check for correct results. Bump ORCA version to 2.63.0 Signed-off-by: NShreedhar Hardikar <shardikar@pivotal.io>
-
由 Jimmy Yih 提交于
The pg_log directory has always been excluded using the pg_basebackup exclude option (-E ./pg_log). With this change, we add it to the static list inside of basebackup. Because of this change, we are able to remove all instances of mkdir pg_log in our management utilities. Previously, the utilities would always have to create the pg_log directory after running pg_basebackup because the postmaster does a validation check on the pg_log path existing. This also helps us align better with upstream Postgres since the pg_basebackup exclude option is Greenplum-specific and really not needed at all. Our dynamic exclusion list hasn't changed for a very long time (so it's pretty much static anyways) and is not maintained in the utilities very well. We may actually remove the pg_basebackup exclude option in the near future.
-
由 mkiyama 提交于
-
- 19 6月, 2018 9 次提交
-
-
由 Lisa Owen 提交于
* docs - docs and updates for pgbouncer 1.8.1 * some edits requested by david * add pgbouncer config page to see also, include directive * add auth_hba_type config param * ldap - add info to migrating section, remove ldap passwds * remove ldap note
-
由 Omer Arap 提交于
This commit updates the GPSD utility to capture the value of the column `stainherit` and also the HLL counters stored in column `stavalues4` generated for sample/full table scan based HLL analyze in `pg_statistic` table. This commit also updates minirepro utility to capture hyperloglog counter Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
由 Omer Arap 提交于
This commit introduces an end-to-end scalable solution to generate statistics of the root partitions. This is done by merging the statistics of leaf partition tables to generate the statistics of the root partition. Therefore, ability to merge leaf table statistics for the root table makes analyze very incremental and stable. **CHANGES IN LEAF TABLE STATS COLLECTION:** Incremental analyze will create sample for each partition as the previous version. While analyzing the sample and generating statistics for the partition, it will also create a `hyperloglog_counter` data structure and add values from the sample to the `hyperloglog_counter` such as number of multiples and sample size. Once the entire sample is processed, analyze will save the `hyperloglog_counter` as a byte array in `pg_statistic` catalog table. We reserve a slot for the `hyperlog_counter` in the table and signify this as a specific type of statistic kind which is `STATISTIC_KIND_HLL`. We only keep the `hyperloglog_counter` in the `pg_catalog` for the leaf partitions. If the user chooses to run FULL scan for HLL, we signify the kind as `STATISTIC_KIND_FULLHLL`. **MERGING LEAF STATISTICS** Once all the leaf partitions are analyzed, we analyze the root partition. Initially, we check if all the partitions have been analyzed properly and have all the statistics available to us in the `pg_statistic` catalog table. If there is a partition with no tuples, even though it has no entry in `pg_catalog`, we consider it as analyzed. If for some reason a single partition is not analyzed, we fall back to the original analyze algorithm that requires to acquire sample for the root partition and calculate statistic based on the sample. Merging null fraction and average width from leaf partition statistics is trivial and does not involve significant challenge. We do calculate them first. Then, the remaining statistics information are: - Number of distinct values (NDV) - Most common values (MCV), and their frequencies termed as most common frequency (MCF) - Histograms that represent the distribution of the data values in the table **Merging NDV:** Hyperloglog provides a functionality to merge multiple `hyperloglog_counter`s into one and calculate the number of distinct values using the aggregated `hyperlog_counter`. This aggregated `hyperlog_counter` is sufficient only if the user chooses to run full scan for hyperloglog. In the sample based approach, without the hyperloglog algorithm, derivation of number of distinct values is not possible. Hyperloglog enables us to merge the `hyperloglog_counter`s from each partition and calculate the NDV on the merged `hyperloglog_counter` with an acceptable error rate. However, it does not give us the ultimate NDV of the root partition, it provides us the NDV of the union of the samples from each partition. The rest of the NDV interpolation depends on four metrics in postgres and based on the formula used in postgres: NDV in the sample, number of multiple values in the sample, sample size and total rows in the table. Using these values the algorithm calculates the approximate NDV for the table. While merging the statistics from the leaf partitions, with the help of hyperloglog we can accurately generate NDV for the sample, sample size and total rows, however, number of multiples in the accumulated sample is unknown since we do not have an access to the accumulated sample at this point. _Number of Multiples_ Our approach to estimate the number of multiples in the aggregated sample (which itself is unavailable) for the root requires the availability of NDVs, number of multiples and size of each leaf sample. The NDVs in each sample is trivial to calculate using the partition's `hyperloglog_counter`. The number of multiples and sample size for each partition is saved in the `hyperloglog_counter` of the partition to be used in the merge during the leaf statistics gathering. Estimating the number of multiples in the aggregate sample for the root partition is a two step process. First, we accurately estimate the number of values that reside in more than one partition's sample. Then, we estimate the number of multiples that uniquely exists in a single partition. Finally, we add these values to estimate the overall number of multiples in the aggregate sample of the root partition. To count the number of values that uniquely exists in one single partition, we utilize hyperloglog functionality. We can easily estimate how many values appear only on a specific partition _i_. We call the NDV of overall aggregate of the entire partition as `NDV_all` and NDV of aggregate of all partitions but _i_ as `NDV_minus_i`. The difference of `NDV_all` and `NDV_minus_i` would result in the values that appear in only one partition. The rest of the values will contribute to the overall number of multiples in the root’s aggregated sample, and we call them as `nMultiple_inter` as the number of values that appear in more than one partition. However, that is not enough since even a single value only resides in one partition, the partition might have multiple of them. We need a way to express the possibility of existence of these values. Remember that we also account the number of multiples that uniquely in partition sample. We already know the number of multiples inside a partition sample, however we need to normalize this value with the proportion of the number of values unique to the partition sample to the number of distinct values of the partition sample. The normalized value would be partition sample i’s contribution to the overall calculation of the nMultiple. Finally, `nMultiple_root` would be the sum of the `nMultiple_inter` and `normalized_m_i` for each partition sample. **Merging MCVs:** We utilize the merge functionality we imported from the 4.3 version of the greenplum DB. The algorithm is trivial. We convert each MCV’s frequency into count and add them up if they appear in more than one partition. After every possible candidate’s count has been calculated, we sort the candidate values and pick the top ones which is defined by the `default_statistics_target`. 4.3 previously blindly picks the top values with the highest count. We however incorporated the same logic used in the current greenplum and postgres and test if a values is a real MCV by running some tests. Therefore, even after the merge, the logic totally aligns with the postgres. **Merging Histograms:** One of the main novel contribution of this commit comes in how we merge the histograms from the leaf partitions. In 4.3 we use priority queue to merge the histogram from the leaf partition. However, that approach is very naive and loses very important statistical information. In postgres, histogram is calculated over the values that did not qualify as an MCV. The merge logic for the histograms in 4.3, did not take this into consideration and significant statistical information is lost while we merge the MCV values. We introduce a novel approach to feed the MCV’s from the leaf partitions that did not qualify as a root MCV to the histogram merge logic. To fully utilize the previously implemented priority queue logic, we treated non-qualified MCV’s as the histograms of a so called `dummy` partitions. To be more previcate, if an MCV m1 is a non-qualified MCV we create a histogram [m1, m1] where it only has one bucket and the bucket size is the count of this non-qualified MCV. When we merge the histograms of the leaf partitions and these dummy partitions the merged histogram would not lose any statistical information. Signed-off-by: NJesse Zhang <sbjesse@gmail.com> Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
由 Omer Arap 提交于
In the previous generation of analyze, gpdb provided features to merge statistics such as MCVs (Most common values) and histograms for the root or midlevel partitions from the leaf partition's statistics. This commit imports the utility functions for merging MCVs and histograms and modifies based on the needs of current version. Signed-off-by: NBhunvesh Chaudhary <bchaudhary@pivotal.io>
-
由 Omer Arap 提交于
-
由 Abhijit Subramanya 提交于
- Port the hyperloglog extension into the contrib directory and make corresponding makefile changes to get it to compile. - Also modify initdb to install the HLL extension as part of gpinitsystem. Signed-off-by: NOmer Arap <oarap@pivotal.io> Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
-
由 Adam Lee 提交于
The processed variable should not be reset while looping all partitions.
-
由 Adam Lee 提交于
BeginCopy() returns a brand new CopyState but ignored the value of skip_ext_partition, set after it. It's a simple boolean of struct CopyStmt, no need to wrap in options.
-
由 Adam Lee 提交于
To have a clean `git status` output.
-
- 18 6月, 2018 1 次提交
-
-
由 Mel Kiyama 提交于
* docs - gpbackup/gprestore new functionality. --gpbackup new option --jobs to backup tables in parallel. --gprestore --include-table* options support restoring views and sequences. * docs - gpbackup/gprestore. fixed typos. Updated backup/restore of sequences and views * docs - gpbackup/gprestore - clarified information on dependent objects. * docs - gpbackup/gprestore - updated information on locking/quiescent state. * docs - gpbackup/gprestore - clarify connection in --jobs option.
-
- 16 6月, 2018 1 次提交
-
-
由 Ashwin Agrawal 提交于
For CO table, storageAttributes.compress only conveys if should apply block compression or not. RLE is performed as stream compression within the block and hence storageAttributes.compress true or false doesn't relate to rle at all. So, with rle_type compression storageAttributes.compress is true for compression levels > 1 where along with stream compression, block compression is performed. For compress level = 1 storageAttributes.compress is always false as no block compression is applied. Now since rle doesn't relate to storageAttributes.compress there is no reason to touch the same based on rle_type compression. Also, the problem manifests more due the fact in datumstream layer AppendOnlyStorageAttributes in DatumStreamWrite (`acc->ao_attr.compress`) is used to decide block type whereas in cdb storage layer functions AppendOnlyStorageAttributes from AppendOnlyStorageWrite (`idesc->ds[i]->ao_write->storageAttributes.compress`) is used. Due to this difference changing just one that too unnecessarily is bound to cause issue during insert. So, removing the unnecessary and incorrect update to AppendOnlyStorageAttributes. Test case showcases the failing scenario without the patch.
-
- 15 6月, 2018 2 次提交
-
-
由 Divya Bhargov 提交于
* Rewrite circular buffer as a Python list Since we end up returning a List object, we may as well keep is as a List object from the start. Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io> Co-authored-by: NDivya Bhargov <dbhargov@pivotal.io>
-
由 Lisa Owen 提交于
* docs - resource group cpuset feature * alter and create resource group sgml ref page updates * gp_resource_group_cpu_limit applies to both CPU alloc modes * add cpuset usage considerations * restore ... fail, not backup * misc edits, move note
-
- 14 6月, 2018 4 次提交
-
-
由 Ming LI 提交于
The hard-coded flag is not correct for all cases.
-
由 Nadeem Ghani 提交于
- Add mirrors with and without standby, and ensure that the host assignment is identical between the two. - Add mirrors, then kill one, and ensure that gprecoverseg operates correctly on the newly added mirror. Co-authored-by: NNadeem Ghani <nghani@pivotal.io> Co-authored-by: NJacob Champion <pchampion@pivotal.io>
-
由 Mel Kiyama 提交于
-
由 Mel Kiyama 提交于
* docs - update GUC optimizer_analyze_root_partition -change default to on -update description * docs - optimizer_analyze_root_partition, fix typo
-
- 13 6月, 2018 1 次提交
-
-
由 Omer Arap 提交于
No hash was created for the new numeric format when it is a `NumericShort`. This commit resolves the issue.
-
- 12 6月, 2018 8 次提交
-
-
由 Jim Doty 提交于
For a while there were several jobs that were behind the nightly trigger. This necessitated some logic about including the nightly-trigger resource if any of a number of conditions were met. At the time of this commit, the only job that is using the resourse is an AIX job. Therefore the inclusion of the nightly-trigger resource will match the conditions that include the only job that requires that resource. This elimnates the "resource not used" error that can be seen when setting a development version of the pipeline that does not include the AIX job. Authored-by: NJim Doty <jdoty@pivotal.io>
-
由 David Yozie 提交于
-
由 Jim Doty 提交于
When cloning a fresh copy of GPDB, running through the documented make process, and then running the make target for the demo cluster, there are three files that get generated. This commit adds those files to the .gitignore files in their respective directories. Authored-by: NJim Doty <jdoty@pivotal.io>
-
由 Mel Kiyama 提交于
* docs - update GUC gp_ignore_error_table -change set classification from system to session -clarify INTO ERROR TABLE clause is not used. * docs - update GUC gp_ignore_error_table - minor edits
-
由 Shoaib Lari 提交于
For long running commands such as gpinitstandby with a large master data directory, the server takes a long time. Therefore, there is no acitivity from the client to the server. If the ClientAliveInterval is set, then the server reports a timeout after ClientAliveInterval seconds. Setting a ServerAliveInterval value less than the ClientAliveInterval interval forces the client to send a Null message to the server. Hence, avoiding the timeout. Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io> Co-authored-by: NShoaib Lari <slari@pivotal.io>
-
由 Mel Kiyama 提交于
* docs - gpbackup ddboost plugin - add replication feature * docs - gpbackup ddboost plugin - fix typos
-
由 Alexandra Wang 提交于
A gate job is added for Release Candidate to make sure that all the release candidate jobs passed for gpdb_src and bin_gpdb for centos6, centos7 and sles11 platform. The Release_Candidate job verifies that the commit SHA of gpdb_src and all the bin_gpdb resources are the same. If the versions don't match, the job will fail. The bin_gpdb_[platform]_rc resources are put to a stable builds bucket so that they can be consumed by integration and components pipelines Co-authored-by: NAlexandra Wang <lewang@pivotal.io> Co-authored-by: NKris Macoskey <kmacoskey@pivotal.io> Co-authored-by: NTrevor Yacovone <tyacovone@pivotal.io>
-
由 Jamie McAtamney 提交于
We have added a test case to verify that the mirror configuration generated by gpaddmirrors with the `-s` option is indeed spread over different hosts for each of the primaries. Co-authored-by: NJim Doty <jdoty@pivotal.io> Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io> Co-authored-by: NNadeem Ghani <nghani@pivotal.io> Co-authored-by: NKevin Yeap <kyeap@pivotal.io> Co-authored-by: NShoaib Lari <slari@pivotal.io>
-
- 11 6月, 2018 5 次提交
-
-
由 Hubert Zhang 提交于
Follow src/pl/plpython/README.md to see how to build and use plpython3u on GPDB. Co-authored-by: NYandong Yao <yyao@pivotal.io>
-
由 Jialun 提交于
-
由 Violet Cheng 提交于
Gpperfmon table rows_out queries_history shows zero values under column "rows_out", even though they returned several rows as output. This fix will decrease the possibility of occurance of this bug. But it is still possible due to gpperfmon harvest mode.
-
由 Adam Lee 提交于
1, pass external table encoding to copy's options, then set cstate->file_encoding to it, for reading and writing. 2, after the merge, copy state doesn't have a member of client encoding, which used to set to the target encoding, get the converted data as a client, now passes the file encoding (from copy options) to convert directly.
-
由 Adam Lee 提交于
gppc.c: In function ‘TFGetFuncExpr’: gppc.c:1255:3: error: implicit declaration of function ‘exprType’ [-Werror=implicit-function-declaration] exprType(list_nth(fexpr->args, argno)) != typid) ^~~~~~~~
-
- 09 6月, 2018 3 次提交
-
-
由 Andreas Scherbaum 提交于
* Add start_ignore and end_ignore around all gp_inject_fault loads
-
由 Ashwin Agrawal 提交于
-
由 Lisa Owen 提交于
-
- 08 6月, 2018 2 次提交
-
-
由 Tom Lane 提交于
This commit pulls in the latest tzdata from Postgres 11. We intentionally left out comment changes to `src/backend/utils/adt/datetime.c` because it's not applicable (yet). > DST law changes in North Korea. Redefinition of "daylight savings" in > Ireland, as well as for some past years in Namibia and Czechoslovakia. > Additional historical corrections for Czechoslovakia. > > With this change, the IANA database models Irish timekeeping as following > "standard time" in summer, and "daylight savings" in winter, so that the > daylight savings offset is one hour behind standard time not one hour > ahead. This does not change their UTC offset (+1:00 in summer, 0:00 in > winter) nor their timezone abbreviations (IST in summer, GMT in winter), > though now "IST" is more correctly read as "Irish Standard Time" not "Irish > Summer Time". However, the "is_dst" column in the pg_timezone_names view > will now be true in winter and false in summer for the Europe/Dublin zone. > > Similar changes were made for Namibia between 1994 and 2017, and for > Czechoslovakia between 1946 and 1947. > > So far as I can find, no Postgres internal logic cares about which way > tm_isdst is reported; in particular, since commit b2cbced9 we do not > rely on it to decide how to interpret ambiguous timestamps during DST > transitions. So I don't think this change will affect any Postgres > behavior other than the timezone-view outputs. > > Discussion: https://postgr.es/m/30996.1525445902@sss.pgh.pa.us (cherry picked from commit 234bb985) Co-authored-by: NJesse Zhang <sbjesse@gmail.com> Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
-
由 Tom Lane 提交于
The non-cosmetic changes involve teaching the "zic" tzdata compiler about negative DST. While I'm not currently intending that we start using negative-DST data right away, it seems possible that somebody would try to use our copy of zic with bleeding-edge IANA data. So we'd better be out in front of this change code-wise, even though it doesn't matter for the data file we're shipping. Discussion: https://postgr.es/m/30996.1525445902@sss.pgh.pa.us (cherry picked from commit b45f6613)
-