提交 · dcd8390bde226ca2d171b175ea9595654c05ec12 · Greenplum / Gpdb

11 2月, 2020 2 次提交

由 Adam Lee 提交于 2月 11, 2020

Remove the gpperfmon since we have the alternative metrics collector.

1, remove all gpperfmon codes, including gpmon, gpsmon, gpmmon,
gpperfmon and alert_log.
2, remove all gpperfmon GUCs, `gp_enable_gpperfmon`, `gp_perfmon_segment_interval`,
`gp_perfmon_print_packet_info`, `gpperfmon_port` and `gpperfmon_log_alert_level`.
3, remove the `perfmon` and `stats sender` processes/bgworkers.
4, remove the `apu` and `libsigar` dependencies.

dcd8390b

Allow cluster to start when standby host is down · 3e69d787

由 Jamie McAtamney 提交于 2月 03, 2020

Previously, gpstart could not start the cluster if a standby master host
was configured but currently down. In order to check whether the
standby was supposed to be the acting master (and prevent the master
from being started if that was the case), gpstart needed to access the
standby host to retrieve the TimeLineID of the standby, and if the
standby host was down the master would not start.

This commit modifies gpstart to assume that the master host is the
acting master if the standby is unreachable, so that it never gets into
a state where neither the master nor the standby can be started.
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>
Co-authored-by: NMark Sliva <msliva@pivotal.io>
Co-authored-by: NAdam Berlin <aberlin@pivotal.io>

3e69d787

10 2月, 2020 2 次提交

Remove unused function skipPadding · 81f2d4f1

由 Daniel Gustafsson 提交于 2月 10, 2020

Commit a693c889 removed all callers
of skipPadding, but left the function in which generates a compiler
warning. Fix by removing.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

81f2d4f1

packaging gpversion.py into gpdb clients tarball (#9534) · e54de5aa

由 Huiliang.liu 提交于 2月 10, 2020

gpload uses gpversion.py to parse gpdb version. So that it can compatible with gpdb5 and gpdb6.
Then we can only maintain one gpload version and some new features or bug fix could be used by gpdb5 customers.
so we package gppylib.gpversion into gpdb clients tarball

e54de5aa

08 2月, 2020 2 次提交

Docs analytics fixes (#9550) · 2c61195c

由 David Yozie 提交于 2月 07, 2020

* Add gpss link

* Correct madlib typo

* Remove broken link (unneeded)

* Fix link to gptext/fts comparison

2c61195c

Make gpcheckcat independent of master dbid · d1f19ca9

由 Ashwin Agrawal 提交于 2月 05, 2020

gpcheckcat hard-coded master dbid to 1 for various queries. This
assumption is flawed. There is no restriction master can only have dbid
1, it can be any value. For example, failover to standby and gpcheckat
is not usable with that assumption.

Hence, run-time find the value of master's dbid using the info that
it's content-id is always -1 and use the same.
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>

d1f19ca9

07 2月, 2020 3 次提交
- N
  
  ci: include compile_gpdb_ubuntu18.04 in ICW group · 7ca64f4c
  由 Ning Yu 提交于 1月 23, 2020
  
  7ca64f4c
- N
  ci: do not generate ubuntu jobs if not specified · 14ac1f24
  由 Ning Yu 提交于 1月 23, 2020
```
We used to generate ubuntu only jobs even if ubuntu is not in the os
list, wrap them with os checkers now.
```
  14ac1f24
- Z
  Correct the count of numMotions for subqueury. · 08d4fced
  由 Zhenghua Lyu 提交于 2月 07, 2020
```
Previously the number motions in subquery plan is not
counted. This commit fixes this.
```
  08d4fced
06 2月, 2020 1 次提交

Skip column acl check in gp_acquire_sample_rows · e3372736

由 Hubert Zhang 提交于 2月 06, 2020

Using 'select pg_catalog.gp_acquire_sample_rows(...)' instead of
'select * from pg_catalog.gp_acquire_sample_rows(...) as (...)' to avoid
specify columns in function return value explicitly.
The old style requires USAGE privilege on each columns which is not
consistent with GPDB 5X.

The following SQL failed to pass acl check in master now:
revoke all on schema public from public;
create role gmuser1;
grant create on schema public to gmuser1;
create extension citext;
create table testid (id int , test citext);
alter table testid owner to gmuser1;
analyze testid;

Idea from Ashwin Agrawal <aagrawal@pivotal.io>
Idea from Taylor Vesely <tvesely@pivotal.io>
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>

e3372736

05 2月, 2020 5 次提交

Avoid possibility of out-of-bound write for neededColumnContextWalker · 85c5d437

由 Alexandra Wang 提交于 2月 03, 2020

neededColumnContextWalker() is called to scan through VARs for targetlist,
quals, etc.. It should only look at VARS for the table being scanned and
avoid all other VARS. Currently, we are not aware of any plans which can
produce situation where neededColumnContextWalker() will encounter some
other VARs. But for GPDB5, we get OUTER vars here if Index scan is right
tree for NestedLoop join. Hence, seems better to have the protective
code to not write out-of-bound.

Adds test to cover the scenario as well which is missing currently.
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: NRichard Guo <guofenglinux@gmail.com>
Reviewed-by: NAsim R P <apraveen@pivotal.io>

85c5d437

S
Merge pull request #9440 from hardikar/text_related · 6d38b727
由 Shreedhar Hardikar 提交于 2月 05, 2020
```
Better cardinality estimation for citext in ORCA
```
6d38b727

reorganize ddboost replication information. (#9520) · 0fb03706

由 Mel Kiyama 提交于 2月 04, 2020

* reorganize ddboost replication information.

--move replication info. into separate topic.
--update toc

* docs - updated docs based on review comments.
--created sections for gpbackup and gpbackup_manager
--added link to example config. files.

0fb03706

Fix gpdb windows clients test cases (#9529) · e9fb7b1d

由 Huiliang.liu 提交于 2月 05, 2020

Use bin_gpdb_centos7 instead of bin_gpdb_centos7_rc as bin_gpdb in test_gpdb_clients_windows job

Update output file of gpfdist_ssl test case due to message changed after external table refactor.

e9fb7b1d

docs - resource group support of runaway query detection (#9508) · 825fd2ce

由 Mel Kiyama 提交于 2月 04, 2020

* docs - resource group support of runaway query detection

update GUC runaway_detector_activation_percent
Add cross reference in
--Admin Guide resource group memory management topic
--CREATE RESOURCE GROUP parameter MEMORY_AUDITOR

This will be backported to 5X)_STABLE

* docs - minor edit

* docs - review comment updates

* docs - simplified description for resource groups
--replaced requirement for vmtracker mem. auditor w/ admin_group, and default_group
--Added global shared memory example from Simon

* docs - created an Admin Guide section for resource group automatic query termination.

* docs - fix math error

825fd2ce

04 2月, 2020 3 次提交

Disable contrib/postgres_fdw · f24c228c

由 Daniel Gustafsson 提交于 2月 04, 2020

postgres_fdw was enabled by mistake in c9d4c1e5, but it
should remain disabled as it's still undergoing work in order to
function properly in for Greenplum.

Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/Lepem3qwJGw

f24c228c

Fix flaky test case gdd/end · 0dfb3ac0

由 Zhenghua Lyu 提交于 2月 04, 2020

The test suite isolation2/gdd is for global-deadlock-detector.
Previous commit has done some optimization to make it faster.
gdd/end is using to reset the cluster's state of global-deadlock
-detector. It invokes helper UDF pg_ctl to only restart the
Master's postmaster. pg_ctl may invoke test_postmaster_connection
to check the restart status, and if it takes more time than the
checking interval, it will print different message that making
this case flaky. Since at the end of gdd/end.sql, we have tests
to make sure the cluster is under correct condition, so this
commit fixes the flakiness by modifying the code of UDF pg_ctl,
now if it works correctly (return value is 0) then it just prints
'OK', if something is wrong, it raise an exception contains the
stdout and stderr for debugging.

0dfb3ac0

Docs - analytics reorg (https://github.com/greenplum-db/gpdb/pull/9414 ) · 1833deff

由 dyozie 提交于 2月 03, 2020

Squashed commit of the following

commit e76c278c96f49f21ee0464097e55ff7d6fc2568e
Merge: 9a14adba 98a67e1197
Author: dyozie <dyozie@pivotal.io>
Date:   Mon Feb 3 14:01:01 2020 -0800

    Merge branch 'feature/docs-analytics-2' of git://github.com/lenapivotal/gpdb into lenapivotal-feature/docs-analytics-2

commit 98a67e1197d0156e791739ba4239ac9fd34c3346
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Jan 23 13:38:03 2020 -0800

    edits from most recent review

commit 0b203c8a1f33293cc8276b81c8ef8b6293178d27
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Jan 13 16:16:05 2020 -0800

    link fixes

commit 388d3f9499f54174a48e448ab9c7e84865bd217a
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Jan 13 15:46:15 2020 -0800

    changes for 6.x -> 7.0 and link fixes

commit 57eaafb142badb685ade787a7e8a3b621289c125
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Jan 13 14:14:18 2020 -0800

    fixing gp version

commit cb4be6ade0319312a5c9cc9fd49c6c321c16981f
Merge: 944bb49b71 5e98376148
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Jan 13 13:57:59 2020 -0800

    resolving conficts

commit 944bb49b71bcf38803336662e081c628352cb53e
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Jan 6 14:36:17 2020 -0800

    adding menu times

    overview title change

    edits to intro for PL

commit 09c0a07f2cf5a3e62723b024aae812678bd83952
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Fri Dec 27 12:27:44 2019 -0800

    edits from feedback

commit f1ac6bebc07b99dcc61b747723d8dcf79bc7a883
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Fri Dec 13 16:16:25 2019 -0800

    changes for new text.xml page

commit 82221149283732615cd51f0cf0f63b1318a30963
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Wed Dec 11 10:20:52 2019 -0800

    madlib page changes with diagram

commit 4ef989ca69aae6a132c5e6b0ff79ca963b441f45
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Dec 9 16:10:52 2019 -0800

    small edit for DITA format

commit e39cfbfaca820fde9047574f7722bf8d9c63eb60
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Dec 9 14:29:25 2019 -0800

    edits to phase 1 menu

commit 5a88a930240f719755a4c1c535ba3dd726f9241e
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Dec 9 13:56:00 2019 -0800

    edits to MADlib overview

commit b3da2616e48ab840cbcde5002520b33de35286f7
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Nov 11 15:07:37 2019 -0800

    analytics edits

commit ccdbdcb0fe34216e6ad5fbde9594e408ac470636
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Nov 11 14:10:51 2019 -0800

    further edits to overview page

commit db3679cd2ff1cf5a944518be06dec1526e208d8b
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Nov 4 14:46:05 2019 -0800

    edits to gp analytics

commit 54679c104ab5823acca6a8f11a934136f51fe3c3
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 31 16:34:02 2019 -0700

    changes to analytics subject:

commit 40c4e35e22529ddd13e5e5f2fbe1c946c5581fd1
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 31 16:32:00 2019 -0700

    updates to analytics work

commit caa37b01e869b106c6c2d52feecf4614d169f6a9
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 31 14:42:57 2019 -0700

    further edits to analytics

commit 318a742a17f405725848670f89cccec2fbda55ba
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Fri Oct 25 15:38:32 2019 -0700

    menu order change

commit a01c62da78154e45873dc31b8f84e6ac0dac337e
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Fri Oct 25 10:54:32 2019 -0700

    added graphics folder for analytics

commit c76c1b9db711d3c14d2779031c7a7e4580e82016
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 24 16:31:02 2019 -0700

    fixing xml error and image location/size

commit 67c7d37f909f65768c3473ec943f77697a4f5e80
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 24 16:09:09 2019 -0700

    ditamaps and erb edits for menus

commit b6be012de62fffc442647b4469ed8ac81e1da432
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 24 13:41:23 2019 -0700

    changes relating to ditamaps

commit c3ac846df5dfba0fd2a77b99b3eeede0fd168f13
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 24 08:34:46 2019 -0700

    adding new overview page

commit a6c37d7a709c2907ac504e40eb81b32427ddae79
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Wed Oct 23 15:12:43 2019 -0700

    testing image insert

commit 3045f7d8164df04ceb133973616d4ff45f06d5d2
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Wed Oct 23 14:46:06 2019 -0700

    fixes to ref links

commit 5693353a94878a65fec8d0f3c4458afbd47b8fc5
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Wed Oct 23 14:38:22 2019 -0700

    fixes for broken ref links

commit d1b389f87a20d3a7eba22d97d59e45c1fc4021f1
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Tue Oct 22 16:36:58 2019 -0700

    changes to analytics section

commit 8c6a9b41f4228cdeba9a00389766c82a00effae8
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Tue Oct 22 16:08:24 2019 -0700

    initial reorg of analytics work

commit 5e983761481b49c58a1025fc3a394e3b8ab3fa3a
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Jan 6 14:36:17 2020 -0800

    adding menu times

    overview title change

    edits to intro for PL

commit e94ffe63c665b3badb09ebd08bcd12605701ed75
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Fri Dec 27 12:27:44 2019 -0800

    edits from feedback

commit 3060c37ab2c5aaa94e72e5f2df12e2b6089d32d1
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Fri Dec 13 16:16:25 2019 -0800

    changes for new text.xml page

commit 43c0cbf4ad932c7b4b72a21de004a46b3e7abfd5
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Wed Dec 11 10:20:52 2019 -0800

    madlib page changes with diagram

commit 57d06c23097607c1507d5908bd706a38b4de1b46
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Dec 9 16:10:52 2019 -0800

    small edit for DITA format

commit 2e5bf3be7f479a5fe5921b91b0aaaa8ed25d255b
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Dec 9 14:29:25 2019 -0800

    edits to phase 1 menu

commit 75a33778ee4dc6f013d8cd1e8c8fbb0cd4b0b083
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Dec 9 13:56:00 2019 -0800

    edits to MADlib overview

commit da1301b13837f7bbac96e139a37a06b0c693f0be
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Nov 11 15:07:37 2019 -0800

    analytics edits

commit 5b98b31fcac8a2905b4c2cc6d30dbed41e04345e
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Nov 11 14:10:51 2019 -0800

    further edits to overview page

commit 9d68eea525c3f07af3bb4269371ba54178204906
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Mon Nov 4 14:46:05 2019 -0800

    edits to gp analytics

commit 8eb2988d6b8d5027dba247ada3dd3cc58ce97f67
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 31 16:34:02 2019 -0700

    changes to analytics subject:

commit 1c1fdffabb0036413509084afe4caa2e58cec110
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 31 16:32:00 2019 -0700

    updates to analytics work

commit 14bfaa14ca9c3e5873522a2cd0a485ed291698b0
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 31 14:42:57 2019 -0700

    further edits to analytics

commit adc11169f1dcac9ab92f9d5ec1dfbd080fac6df3
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Fri Oct 25 15:38:32 2019 -0700

    menu order change

commit 84dbee9d8de0d245372c8858254b0eab3d8fa0b3
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Fri Oct 25 10:54:32 2019 -0700

    added graphics folder for analytics

commit bd66357bd577eb967b765a3740db145ce17a44cd
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 24 16:31:02 2019 -0700

    fixing xml error and image location/size

commit 8836b5f9dd262d06582497be398c91d9385ccb1b
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 24 16:09:09 2019 -0700

    ditamaps and erb edits for menus

commit 3472e6cc9dae118239651bd9447fcc996b8c52e7
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 24 13:41:23 2019 -0700

    changes relating to ditamaps

commit 7df800760b834bcbd37746f2248e7696dcf497a9
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Thu Oct 24 08:34:46 2019 -0700

    adding new overview page

commit df01bfa4f9bb18c99bbbf93b089c27b69e953ad4
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Wed Oct 23 15:12:43 2019 -0700

    testing image insert

commit f34ca13d9842f844af2fb06846981bea7c63ff9b
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Wed Oct 23 14:46:06 2019 -0700

    fixes to ref links

commit 17d98775fb142efc760c404a402d85493bf9bf0b
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Wed Oct 23 14:38:22 2019 -0700

    fixes for broken ref links

commit 9d35b209e1bae9933fd85cec8df7b4b6f73036ec
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Tue Oct 22 16:36:58 2019 -0700

    changes to analytics section

commit 7e5de7a01d506158ef106a494f2fdfe679cb25f5
Author: Lena Hunter <lhunter@pivotal.io>
Date:   Tue Oct 22 16:08:24 2019 -0700

    initial reorg of analytics work

1833deff

31 1月, 2020 11 次提交

Resurrect constraint checking on external tables. · 9a14adba

由 Heikki Linnakangas 提交于 1月 31, 2020

I initially thought that this was dead code, because you can't create a
CHECK constraint on an external table normally. However, when you exchange
an external table with a table partition, the partition's CHECK
constraints, which check the partition boundaries, are applied to the
external table.

This fixes a regression failure in 'partindex_test' test.

Resurrect the old partition checking code, so that you get the same
behavior as before. I'm not convinced this is really the best behavior,
but this lets us move forward while we discuss what behavior we actually
want.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/v-ZTreV0ud4/gZutaKYNDQAJ

9a14adba

Refactor away special code for external tables in planner and executor · a55b9453

由 Heikki Linnakangas 提交于 1月 31, 2020

This makes external tables less of a special case throughout the planner
and executor. They're now mostly handled through the FDW API callbacks.

* Add a FDW handler function and implement the API routines to construct
  ForeignScan paths and Plan nodes, iterate through a scan, and to insert
  tuples to the external/foreign table. The API routines just call through
  to the existing external table functions, like external_getnext() and
  external_insert().

* Remove ExternalScan plan node in the executor and the ExternalScan path
  node from the planner. Use ForeignScan plan and path nodes instead, like
  for any other FDW. Move code related to external table planning to a
  new exttable_fdw_shim.c file.

* The parameters previously carried in the  ExternalScan struct are now in
  a new ExternalScanInfo struct. (Or, the ExternalScan struct has been
  renamed to ExternalScanInfo, if you want to think of it that way.) It's
  not a plan node type anymore, but it still needs read/out function
  support so that the parameters can be serialized. Alternatively, the
  parameters could be carried in Lists of Values and other existing
  expression types, but a struct seems easier to handle. Perhaps the
  cleanest solution would be to use the ExtensibleNode infrastructure for
  it, but I'll leave that for another patch. As long as the external table
  FDW is in the backend anyway, it's simplest to have the out/read/copy
  funcs built-in as well.

* Modify ForeignScan executor node so that it can make up "fake ctids" like
  ExternalScan did, and also add "squelching" support to it.

* Remove special handling of external tables from ModifyTable node. It now
  uses the normal FDW routines for it.

COPY still calls directly into the external_insert() function, because
PostgreSQL doesn't support COPY into a foreign table until version 11. (We
don't seem to have any tests for COPY TO/FROM an external table. TODO: add
test.)

a55b9453

Change the catalog representation for external tables. · b62e0601

由 Heikki Linnakangas 提交于 1月 31, 2020

External tables now use relkind='f', like all foreign tables. They have
an entry in pg_foreign_table, as if they belonged to a special foreign
server called "exttable_server". That foreign server gets special
treatment in the planner and executor, so that we still plan and execute
it the same as before.

* ALTER / DROP EXTERNAL TABLE is now mapped to ALTER / DROP FOREIGN TABLE.
  There is no "OCLASS_EXTTABLE" anymore. This leaks through to the user
  in error messages, e.g:

    postgres=# drop external table boo;
    ERROR:  foreign table "boo" does not exist

  and to the command tag on success:

    postgres=# drop external table boo;
    DROP FOREIGN TABLE

* psql \d now prints external tables as Foreign Tables.

Next steps:
* Use the foreign table API routines instead of special casing
  "exttable_server" everywhere.

* Get rid of the pg_exttable table, and store the all the options in
  pg_foreign_table.ftoptions instead.

* Get rid of the extra fields in pg_authid to store permissions to
  create different kinds of external tables. Store them as ACLs in
  pg_foreign_server.

b62e0601

H

Refactor code to print external table info in psql \d. · da1f1a36
由 Heikki Linnakangas 提交于 1月 31, 2020

da1f1a36
H
Remove restriction that you cannot use ALTER TABLE on external tables. · aac5391a
由 Heikki Linnakangas 提交于 1月 31, 2020
```
In PostgreSQL, ALTER TABLE is allowed where ALTER FOREIGN TABLE is. Allow
external tables the same leniency.
```
aac5391a
H
Remove unused 'fs_inited' field. · 6a96f34f
由 Heikki Linnakangas 提交于 1月 31, 2020
```
It was set but never used.
```
6a96f34f

Remove unnecessary condition on relstorage from gpload. · 4f10a873

由 Heikki Linnakangas 提交于 1月 31, 2020

The condition listed all possible values of relstorage, except for
'f' for RELSTORAGE_FOREIGN. The condition on relkind filters out foreign
tables as well, so the condition on relstorage is redundant. (Although
I don't think filtering out foreign table was even the intention here.)

4f10a873

Remove gpcontrib/pxf, rename {,gp}contrib/pxf_fdw · 3b1ada53

由 Oliver Albertini 提交于 1月 30, 2020

Now that we are moving to FDW, we can remove the old external
table-based PXF module. Since PXF FDW only currently supports Greenplum
(not stand-alone Postgres) `pxf_fdw` should live under `gpcontrib`.
Authored-by: NOliver Albertini <oalbertini@pivotal.io>

3b1ada53

PXF FDW: Run Fragmenter call from master only · 39b7b809

由 Francisco Guerrero 提交于 1月 30, 2020

Currently, every segment node retrieves metadata about the list of
fragments it's going to process. Then it filters out fragments assigned
to that segment, and then it processes each fragment, one at a time.
This operation can stress the external metadata servers when the
Greenplum cluster is large, because every segment will connect at the
same time to the external system to fetch metadata. An optimization was
introduced in PXF to cache the metadata at the PXF Server level, when
multiple segments were trying to access the same metadata, PXF would
only issue 1 query to the external system. This helped improved the
situation, but still, every segment host was getting the same metadata.

In Foreign Data Wrappers, this metadata query can be done in a single
place from master. And master can provide this information to the
segments.

39b7b809

Implement PXF Foreign Data Wrapper · c9d4c1e5

由 Francisco Guerrero 提交于 1月 30, 2020

PXF, (Platform Extension Framework) provides access to external data in
Greenplum[1]. Previously, it was based on external tables, this commit
introduces the FDW that can be used to communicate with PXF.

* Provide the skeleton for PXF Foreign Data Wrappers.

* Add pxf_fdw_version()

* Validation for WRAPPER, SERVER, USER-MAPPING, and FOREIGN TABLE options

* Only build pxf_fdw when --enable_pxf is present PXF is enabled by default,
but this gives the user the option of turning off the building of the PXF
contrib module with --disable-pxf.

* Integrate FDW with PXF's fragmenter and bridge
This allows a Greenplum user to create foreign tables that read
from external data via PXF. See [2] for documentation.
The only wire format (format used to communicate between PXF JVM and
Greenplum segments) supported is going to be TEXT for FDW. Other formats
which come across as a binary stream of data (like Parquet) are not
implemented. We validate and pass along the following options:

- header
- delimiter
- quote
- escape
- null
- encoding
- newline
- fill_missing_fields
- force_not_null
- force_null
- reject_limit
- reject_limit_type (percent/rows)

and enforce precedence rules for `other_options`.
For options other than protocol, resource, format, reject_limit*, use
the following precedence rules:

- Table options take precedence over all other options
- User-Mapping options take precedence over server and wrapper options
- Server options take precedence over wrapper options

* Add support for and validation of log_errors.

* Complete FDW write (master only)

This is facilitated by externalizing a new function in COPY: `BeginCopy()`.

* Introduce a `config` option for SERVER:
Access to Hadoop clusters is nuanced, because with a single set of configurations
users are able to access HDFS, Hive, HBase and other services.
Suppose an enterprise user has a Hortonworks hadoop installation that includes
HDFS, Hive, and HBase. We would configure one server per technology we access,
for example:

CREATE SERVER hdfs_hdp
FOREIGN DATA WRAPPER hdfs_pxf_fdw
OPTIONS ( config 'hdp_1' );

CREATE SERVER hive_hdp
FOREIGN DATA WRAPPER hive_pxf_fdw
OPTIONS ( config 'hdp_1' );

CREATE SERVER hbase_hdp
FOREIGN DATA WRAPPER hbase_pxf_fdw
OPTIONS ( config 'hdp_1' );

To reduce the amount of configuration required for each server, we introduce a
new option `config`. This new option provides the name of the server directory where
the configuration files reside. In the example above, configuration files live
in the `$PXF_CONF/servers/hdp_1` directory, and all three servers share the same
configuration directory.

* Default wire_format to CSV

The wire_format in PXF defaults to CSV. Only when the file format is
tab-delimited text, we will use TEXT as the wire_format. This commit
makes CSV the default wire_format for PXF

* Add column projection

* Pass filter string (WHERE clauses) to PXF

[1] https://github.com/greenplum-db/pxf
[2] https://github.com/greenplum-db/pxf/blob/pxf-fdw-d/PXF_FDW.mdCo-authored-by: NOliver Albertini <oalbertini@pivotal.io>
Co-authored-by: NRaymond Yin <ryin@pivotal.io>
Co-authored-by: NFrancisco Guerrero <aguerrero@pivotal.io>

c9d4c1e5

Expose BeginCopy to be used by foreign data wrappers · d08e8f87

由 Francisco Guerrero 提交于 1月 30, 2020

Currently, BeginCopyToForExternalTable allows External Tables to hook
into the copy code for writing. Instead of adding a similar
BeginCopyToForForeignTable function, we instead expose BeginCopy.

This will allow pxf_fdw to get a CopyState for write data from greenplum
to an external source through PXF.
Co-authored-by: NFrancisco Guerrero <aguerrero@pivotal.io>
Co-authored-by: NOliver Albertini <oalbertini@pivotal.io>

d08e8f87

30 1月, 2020 9 次提交

Cosmetic cleanup. · 6b0e0fc9

由 Heikki Linnakangas 提交于 1月 30, 2020

Remove unnecessary includes that referenced 'currentSliceId', the files
don't actually use it. Fix placement of local variable in ExecInitNode.

6b0e0fc9

H
Remove obsolete comment. · 0a99735f
由 Heikki Linnakangas 提交于 1月 30, 2020
```
The Plan->memoryAccountId field was removed in commit 7c9cc053.
```
0a99735f

Remove Alien memory account. · 2bc04a19

由 Heikki Linnakangas 提交于 1月 30, 2020

When executor nodes are initialized at executor startup, in the
ExecInitPlan() stage, any nodes that are not going to be executed in the
current slice were assigned to so-called Alien memory account. Previously,
that was done to keep the useless nodes out of the "real" memory balances,
but nowadays we normally don't bother initializing alien nodes in the
first place. Alien node elimination can be disabled with 'set
execute_pruned_plan=off', but that's a developer option that people
shouldn't normally mess with. So in normal operation, the Alien memory
account is never used.

The Alien memory account was kept around when the alien node elimination
was implemented (see commit 9b8f5c0b). The idea was that we could turn
it on/off, and see how much we're saving by looking at the memory usage
in the Alien memory account when it's turned 'off'. But that's hardly
interesting anymore, we know that alien node elimination is useful, and
it has worked great in production for some time now.

We could probably get rid of the 'execute_pruned_plan' GUC altogether at
this point, but I kept it for now. If you do turn it off, all the alien
nodes will now get their own memory accounts, like non-alien nodes.
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>

2bc04a19

Estimate # of distinct rows more accurately in multi-stage agg paths. · 5813e016

由 Heikki Linnakangas 提交于 1月 30, 2020

Commit 9936ca3b improved the cost model of multi-stage Aggregates,
introducing a formula for estimating the number of distinct groups
seen in each segment (groupNumberPerSegment() function, later renamed
to estimate_num_groups_per_segment()). However, we lost that with the
rewrite of the multi-stage agg planning code in the 9.6 merge, and
reverted to a more naive estimate. Put back the more accurate formula.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/zsl3m_Tcb1g/MCo7pY-vAgAJReviewed-by: NZhenghua Lyu <zlv@pivotal.io>

5813e016

docs - add python3 information PL/Container configuration example (#9501) · cfc6424a

由 Mel Kiyama 提交于 1月 29, 2020

* docs - add python3 information PL/Container configuration example

Also some other minor updates and fixed

* docs - updates based on review comments for pl/container support of python 3

* docs - minor edit

cfc6424a

S

Fix gp_dqa tests for ORCA to make pipeline green · a5939b94
由 Sambitesh Dash 提交于 1月 29, 2020

a5939b94
S

Fix aggregates test case · db47a357
由 Sambitesh Dash 提交于 1月 28, 2020

db47a357
S

Address code review comments · 628bc8e9
由 Sambitesh Dash 提交于 1月 24, 2020

628bc8e9

ORCA should fallback for multi argument DQA · eb2a33ef

由 Sambitesh Dash 提交于 1月 23, 2020

As of now ORCA doesn't support multi argument DQA like below

SELECT distinct (a,b) from foo;

Earlier planner didn't support it too so we errored out in the parser
itself. But now planner supports it so ORCA needs to handle the fallback.

Co-authored-by: Sambitesh Dash sdash@pivotal.io

eb2a33ef

29 1月, 2020 2 次提交

Fix row-count estimate for the lower stages of DQA plans. · 80176d0c

由 Heikki Linnakangas 提交于 1月 29, 2020

In a query with a single DISTINCT-qualified aggregate, the row count
estimate for the bottom deduplication aggregate steps was taken from the
overall aggregation's row count estimate. That could be dramatically
different. For example, in this query from the regression tests:

> explain select sum(distinct b) from olap_test_single;
>                                                   QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------
>  Finalize Aggregate  (cost=166.23..166.24 rows=1 width=8)
>    ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=166.19..166.22 rows=1 width=8)
>          ->  Partial Aggregate  (cost=166.19..166.20 rows=1 width=8)
>                ->  HashAggregate  (cost=166.16..166.19 rows=1 width=4)
>                      Group Key: b
>                      ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=165.00..165.99 rows=11 width=4)
>                            Hash Key: b
>                            ->  Streaming HashAggregate  (cost=165.00..165.33 rows=11 width=4)
>                                  Group Key: b
>                                  ->  Seq Scan on olap_test_single  (cost=0.00..115.00 rows=3334 width=4)
>  Optimizer: Postgres query optimizer
> (11 rows)

Before this patch, the Streaming HashAggregate at the bottom had a row count
estimate of 1 rows. 1 row is correct for the overall query, as an aggregate
query with no GROUP BY always returns one row, but wildly incorrect for the
deduplicating Streaming HashAggregate. It returns as many rows as there are
distinct values, the aggregation that rolls them up to one row only happens
in the Partial and Finalize Aggregate steps.
Reviewed-by: NTaylor Vesely <tvesely@pivotal.io>

80176d0c

Minor cleanup and comment improvements in gp_acquire_sample_rows() · 80d624b4

由 Heikki Linnakangas 提交于 1月 28, 2020

A number of small changes to improve the readability of the function:

- Introduce NUM_SAMPLE_FIXED_COLS constant for the number of "header"
  columns in the gp_acquire_sample_rows() result set.

- Rename 'rows' and 'numrows' fields to 'sample_rows' and
  'num_sample_rows', to make it more clear that they refer to the rows
  in the sample, not to the rows in the function's result set. (The result
  set has one row per sample row, plus one summary row, emitted by each
  segment)

- Rename 'natts' local variable to 'live_natts', to make it more clear that
  it does not include dropped cols.

- Remove 'natts' (live_natts) from the state struct. It can be computed
  from the number of output attributes.

- Remove redundant code to initialize output values/isnulls arrays for the
  summary row. They were initialized to all-NULLs twice, which is harmless
  but unnecessary and confusing.
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

80d624b4