提交 · f71749667c94415b9ef2cf761ef365ed00937a3a · Greenplum / Gpdb

09 11月, 2020 1 次提交

由 xiaoxiao 提交于 11月 09, 2020

* refactor gpload test file TEST.py

1. migrate gpload test to pytest
2. new function to form config file through yaml package and make it more reasonable
3. add a case to cover gpload update_condition arggument

* migrate gpload and TEST.py to python3.6
new test case 43 to test gpload behavior when column name has capital letters and without data type
change some ans file since psql react different

* change sql to find reuseable external table to make gpload compatible in gp7 and gp6
better TEST.py to write config file with ruamel.yaml moudle
Co-authored-by: NXiaoxiaoHe <hxiaoxiao@vmware.com>

f7174966

25 9月, 2020 5 次提交

Update gpload code and tests for Python 3 · 3c124b2f

由 Jamie McAtamney 提交于 9月 24, 2020

Co-authored-by: NAshwin Agrawal <aashwin@vmware.com>
Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>

3c124b2f

Update utilities code to work with Python 3 · 78f5cf43

由 Jamie McAtamney 提交于 9月 24, 2020

This commit makes several broad changes to address conversion issues common to
multiple utilities:

- The input and output of subprocess in Python 3 are now bytestrings instead
  of strings. Thus, some sanitizing of inputs and outputs is necessary

- Many built-in functions like raw_input and __cmp__ are deprecated in Python 3,
  and as a side effect list sorting and hashing work differently, requiring a
  different set of helper functions

- Implicit relative imports no longer work, so dbconn (in utilities code) and
  mgmt_utils (in test code) must be added to the search path and imported using
  a full path instead

- File objects require flush methods in python3, and popen2 has been deprecated
Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
Co-authored-by: NTyler Ramer <tramer@vmware.com>

78f5cf43

Remove subprocess32 · 6071056d

由 Tyler Ramer 提交于 9月 24, 2020

The subprocess32 package is a backport of Python 3 subprocess functionality to
Python 2, so with the upgrade to Python 3 it is no longer necessary.

This commit deletes the package from pythonSrc and changes import statements to
import subprocess directly, instead of falling back to it only if subprocess32
is not importable.
Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
Co-authored-by: NTyler Ramer <tramer@vmware.com>

6071056d

Allow GPDB to build and test with Python 3 · 7306abea

由 Tyler Ramer 提交于 9月 24, 2020

- Update Python file shebangs to use python3 and update gp_replicate_check and
  gpversion.py to allow running under Python 3

- Use Centos 7 dev containers with Python 3 and pip3 installed for testing, as
  prod containers do not yet work with Python 3, and update Travis with Python 3

- Install dependencies with pip3 to get Python 3-compatible versions

- Copy the Python 3 version of .so files, don't unset PYTHONHOME and PYTHONPATH,
  and don't remove built files from install locations, so that the Python 2 and
  Python 3 versions of various files can coexist
Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>
Co-authored-by: NKris Macoskey <kmacoskey@vmware.com>
Co-authored-by: NTyler Ramer <tramer@vmware.com>

7306abea

Run 2to3 against Python code · 54a65573

由 Jamie McAtamney 提交于 9月 24, 2020

The 2to3 utility is an officially-supported script to automatically convert
Python 2 code to Python 3. It's not a complete fix by any means, but it
handles most basic syntax transformations and similar.

This commit is the result of running 2to3 against every Python file in the
gpMgmt directory, so it's quite large and fairly scattershot. Manual updates
to any code that 2to3 can't handle will come in later commits.

54a65573

16 9月, 2020 1 次提交

Fix gpload fail when capital letters in column in merge mode (#10804) · 4f7d02d8

由 xiaoxiao 提交于 9月 16, 2020

* add double quatations when creating staging table
omit distribution key

* fix gpload fail when column names have capital letters in merge mode
Co-authored-by: NXiaoxiaoHe <hxiaoxiao@vmware.com>

4f7d02d8

07 9月, 2020 2 次提交
- X
  Revert "fix gpload upper letters in column in merge mode (#10763)" (#10776) · a27e5115
  由 xiaoxiao 提交于 9月 07, 2020
```
This reverts commit 1060a425.
```
  a27e5115
- X
  fix gpload upper letters in column in merge mode (#10763) · 1060a425
  由 xiaoxiao 提交于 9月 07, 2020
```
* fix gpload fail when capital letters in column name in merge mode

add double quotations in column names when create staging tables
omit distribution key
Co-authored-by: NXiaoxiaoHe <hxiaoxiao@vmware.com>
```
  1060a425
31 8月, 2020 1 次提交

fix gpload multi-level partition table and special char in columns issue (#10686) · d80ec3a5

由 xiaoxiao 提交于 8月 31, 2020

fix match column condition to resovle primary key conflict when using the gpload
merge mode to import data to the Multi-level partition table
fix fail when special char and capital letters in column names
Co-authored-by: NXiaoxiaoHe <hxiaoxiao@vmware.com>

d80ec3a5

08 7月, 2020 1 次提交

Update PyYAML to 5.3.1 · 8d6c3059

由 Tyler Ramer 提交于 6月 23, 2020

The version of PyYAML vendored in gpMgmt/bin/ext is old, unmaintained,
and does not support python3. Actually, it does not even contain a
`__version__` attribute, so it is not possible to know the version.

We need to unvendor YAML and get to a library version that supports
python3 - for this reason, we are updating to the latest PyYAML
available.

Also update yaml.load to use yaml.safe_load instead.
Co-authored-by: NTyler Ramer <tramer@vmware.com>
Co-authored-by: NJamie McAtamney <jmcatamney@vmware.com>

8d6c3059

17 6月, 2020 1 次提交

Update PyGreSQL from 4.0.0 to 5.1.2 · f5758021

由 Tyler Ramer 提交于 5月 26, 2020

This commit updates pygresql from 4.0.0 to 5.1.2, which requires
numerous changes to take advantages of the major result syntax change
that pygresql5 implemented. Of note, cursors or query objects
automatically cast returned values as appropriate python types - list of
ints, for example, instead of a string like "{1,2}". This is the bulk of
the changes.

Updating to pygresql 5.1.2 provides numerous benfits, including the
following:

- CVE-2018-1058 was addressed in pygresql 5.1.1

- We can save notices in the pgdb module, rather than relying on importing
the pg module, thanks to the new "set_notices()"

- pygresql 5 supports python3

- Thanks to a change in the cursor, using a "with" syntax guarentees a
  "commit" on the close of the with block.

This commit is a starting point for additional changes, including
refactoring the dbconn module.

Additionally, since isolation2 uses pygresql, some pl/python scripts
were updated, and isolation2 SQL output is further decoupled from
pygresql. The output of a psql command should be similar enough to
isolation2's pg output that minimal or no modification is needed to
ensure gpdiff can recognize the output.
Co-Authored-by: NTyler Ramer <tramer@pivotal.io>
Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>

f5758021

03 6月, 2020 1 次提交
- W
  
  Add "FILL_MISSING_FIELDS" option for gpload. · cb76c301
  由 Wen Lin 提交于 6月 03, 2020
  
  cb76c301
15 5月, 2020 1 次提交

Fix the gpload error that have no attribute "staging_table" or "fast_path" · 524f3105

由 Wen Lin 提交于 5月 15, 2020

while gpload is loading data if the configure file contains "error_table" and doesn't contain "preload", an error of no attribute "staging_table" or "fast_path"  occurs.

524f3105

12 5月, 2020 1 次提交

[skip-ci] gpload: improve error message (#10076) · 32eadd31

由 Peifeng Qiu 提交于 5月 12, 2020

gpload in the latest windows client package requires VS redistributable
package. Output more meaningful message if pg.py fails to load.

32eadd31

03 4月, 2020 1 次提交
- R
  removed wrong errormsg which shows on Windows in gpload.py (#9867) · 4d386383
  由 Ryan Zhang 提交于 4月 03, 2020
```
Co-authored-by: NRyan <ryan@chapterx.com>
```
  4d386383
20 3月, 2020 1 次提交

(

Enable external table's error log to be persistent for ETL. (#9757) · 04fdd0a6

由 (Jerome)Junfeng Yang 提交于 3月 20, 2020

For ETL user scenarios, there are some cases that may frequently create
and drop the same external table. And once the external table gets dropped.
All errors stored in the error log will lose.

To enable error log persistent for external with the same
"dbname"."namespace"."table".
Bring in "error_log_persistent" external table option. If create the
external table with `OPTIONS (error_log_persistent 'true')` and `LOG ERROR`,
the external's error log will be name as "dbid_namespaceid_tablename"
under "errlogpersistent" directory.
And drop external table will ignore to delete the error log.

Since GPDB 5, 6 still use pg_exttable's options to mark
LOG ERRORS PERSISTENTLY, so keep the ability for loading from
OPTIONS(error_log_persistent 'true').

Create separate `gp_read_persistent_error_log` function to read
persistent error log.
If the external table gets deleted, only the namespace owner
has permission to delete the error log.

Create separate `gp_truncate_persistent_error_log` function to delete
persistent error log.
If the external table gets deleted. Only the namespace owner has
permission to delete the error log.
It also supports wildcard input to delete error logs
belong to a database or whole cluster.

If drop an external table create with `error_log_persistent`. And then
create the same "dbname"."namespace"."table" external table without
persistent error log. It'll write errors to the normal error log.
The persistent error log still exists.
Reviewed-by: NHaozhouWang <hawang@pivotal.io>
Reviewed-by: NAdam Lee <ali@pivotal.io>

04fdd0a6

28 2月, 2020 1 次提交

Add max_retries flag for gpload (#9606) · b891b85b

由 Huiliang.liu 提交于 2月 28, 2020

Add max_retries flag for gpload. It indicates the max times on connecting to GPDB timed out.
max_retries default value is 0 which means no retry.
If max_retries is -1 or other negative value, it means retry forever.

Test has been done manually.

b891b85b

31 1月, 2020 2 次提交

Change the catalog representation for external tables. · b62e0601

由 Heikki Linnakangas 提交于 1月 31, 2020

External tables now use relkind='f', like all foreign tables. They have
an entry in pg_foreign_table, as if they belonged to a special foreign
server called "exttable_server". That foreign server gets special
treatment in the planner and executor, so that we still plan and execute
it the same as before.

* ALTER / DROP EXTERNAL TABLE is now mapped to ALTER / DROP FOREIGN TABLE.
  There is no "OCLASS_EXTTABLE" anymore. This leaks through to the user
  in error messages, e.g:

    postgres=# drop external table boo;
    ERROR:  foreign table "boo" does not exist

  and to the command tag on success:

    postgres=# drop external table boo;
    DROP FOREIGN TABLE

* psql \d now prints external tables as Foreign Tables.

Next steps:
* Use the foreign table API routines instead of special casing
  "exttable_server" everywhere.

* Get rid of the pg_exttable table, and store the all the options in
  pg_foreign_table.ftoptions instead.

* Get rid of the extra fields in pg_authid to store permissions to
  create different kinds of external tables. Store them as ACLs in
  pg_foreign_server.

b62e0601

Remove unnecessary condition on relstorage from gpload. · 4f10a873

由 Heikki Linnakangas 提交于 1月 31, 2020

The condition listed all possible values of relstorage, except for
'f' for RELSTORAGE_FOREIGN. The condition on relkind filters out foreign
tables as well, so the condition on relstorage is redundant. (Although
I don't think filtering out foreign table was even the intention here.)

4f10a873

01 11月, 2019 1 次提交
- H
  GPload: change metadata query SQL to improvement performance (#8904) · f6407f90
  由 Huiliang.liu 提交于 11月 01, 2019
```
GPload: change metadata query SQL to improvement performance
Old query SQL may take long time if catalog is large.
```
  f6407f90
27 9月, 2019 1 次提交
- H
  
  catch ImportError for gpversion (#8709) · f249fd76
  由 Huiliang.liu 提交于 9月 27, 2019
  
  f249fd76
20 9月, 2019 1 次提交

Ship subprocess32 and replace subprocess with it in python code (#8658) · 9c4a885b

由 Paul Guo 提交于 9月 20, 2019

* Ship modified python module subprocess32 again

subprocess32 is preferred over subprocess according to python documentation.
In addition we long ago modified the code to use vfork() against fork() to
avoid some "Cannot allocate memory" kind of error (false alarm though - memory
is actually sufficient) on gpdb product environment that is usually with memory
overcommit disabled.  And we compiled and shipped it also but later it was just
compiled but not shipped somehow due to makefile change (maybe a regression).
Let's ship it again.

* Replace subprocess with our own subprocess32 in python code.

9c4a885b

26 8月, 2019 2 次提交
- H
  
  Fix gpload unit test case (#8498) · a905a1eb
  由 Huiliang.liu 提交于 8月 26, 2019
  
  a905a1eb
- H
  GPload supports GPDB5 and GPDB6 with the same gpload.py file (#8483) · 2fdc38ff
  由 Huiliang.liu 提交于 8月 26, 2019
```
* Get gpdb version and support gpdb5 and gpdb6
* add gpversion.py into windows package
```
  2fdc38ff
09 7月, 2019 1 次提交

Remove variable self-assignment in gpload · d6239081

由 Daniel Gustafsson 提交于 7月 09, 2019

Setting a variable to itself is a no-op which can be removed. This
may have been introduced in error and instead masking a real bug,
but if it so then we have lived with it for two years so I'm opting
for removing.

Reviewed-by: Asim R P and Bhuvnesh Chaudhary

d6239081

10 4月, 2019 1 次提交

Remove references to SunOS and HP-UX (#7356) · 52c37372

由 Ben Christel 提交于 4月 09, 2019

We don't support Greenplum on these platforms.

Some files (e.g. Makefile.{hpux,solaris}) have been left in place
because they are upstream postgres files. Removing them isn't
worth the headache it would cause when merging commits from
postgres.
Authored-by: NBen Christel <bchristel@pivotal.io>

52c37372

01 2月, 2019 1 次提交

Rename gp_distribution_policy.attrnums to distkey, and make it int2vector. · 69ec6926

由 Heikki Linnakangas 提交于 2月 01, 2019

This is in preparation for adding operator classes as a new column
(distclass) to gp_distribution_policy. This naming is consistent with
pg_index.indkey/indclass. Change the datatype to int2vector, also for
consistency with pg_index, and some other catalogs that store attribute
numbers, and because int2vector is slightly more convenient to work with
in the backend. Move the column to the end of the table, so that all the
variable-length and nullable columns are at the end, which makes it
possible to reference the other columns directly in Form_gp_policy.

Add a backend function, pg_get_table_distributedby(), to deparse the
DISTRIBUTED BY definition of a table into a string. This is similar to
pg_get_indexdef_columns(), pg_get_functiondef() etc. functions that we
have. Use the new function in psql and pg_dump, when connected to a GPDB6
server.
Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Co-authored-by: NPeifeng Qiu <pqiu@pivotal.io>
Co-authored-by: NAdam Lee <ali@pivotal.io>

69ec6926

17 1月, 2019 1 次提交

Remove duplicate import and unused vars from gpload · d9bc848a

由 Daniel Gustafsson 提交于 1月 17, 2019

This removes a duplicate import and a few set, but never used, vars
from the gpload.py code as well as the including_defaults token as
it was clearly unused.

Also fixes a few typos while in there, one of which is a user facing
error message.
Reviewed-by: NJacob Champion <pchampion@pivotal.io>

d9bc848a

13 12月, 2018 1 次提交

Reporting cleanup for GPDB specific errors/messages · 56540f11

由 Daniel Gustafsson 提交于 12月 13, 2018

The Greenplum specific error handling via ereport()/elog() calls was
in need of a unification effort as some parts of the code was using a
different messaging style to others (and to upstream). This aims at
bringing many of the GPDB error calls in line with the upstream error
message writing guidelines and thus make the user experience of
Greenplum more consistent.

The main contributions of this patch are:

* errmsg() messages shall start with a lowercase letter, and not end
  with a period. errhint() and errdetail() shall be complete sentences
  starting with capital letter and ending with a period. This attempts
  to fix this on as many ereport() calls as possible, with too detailed
  errmsg() content broken up into details and hints where possible.

* Reindent ereport() calls to be more consistent with the common style
  used in upstream and most parts of Greenplum:

	ereport(ERROR,
			(errcode(<CODE>),
			 errmsg("short message describing error"),
			 errhint("Longer message as a complete sentence.")));

* Avoid breaking messages due to long lines since it makes grepping
  for error messages harder when debugging. This is also the de facto
  standard in upstream code.

* Convert a few internal error ereport() calls to elog(). There are
  no doubt more that can be converted, but the low hanging fruit has
  been dealt with. Also convert a few elog() calls which are user
  facing to ereport().

* Update the testfiles to match the new messages.

Spelling and wording is mostly left for a follow-up commit, as this was
getting big enough as it was. The most obvious cases have been handled
but there is work left to be done here.

Discussion: https://github.com/greenplum-db/gpdb/pull/6378Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

56540f11

30 11月, 2018 1 次提交

Remove unused local variable · 9347d895

由 Daniel Gustafsson 提交于 11月 30, 2018

Reviewed-by: NJacob Champion <pchampion@pivotal.io>
Reviewed-by: NJimmy Yih <jyih@pivotal.io>

9347d895

29 11月, 2018 1 次提交

Compare with None using the is operator · e39047b5

由 Daniel Gustafsson 提交于 11月 29, 2018

While == None works for comparison, it's a wasteful operation as it
performs type conversion and expansion. Instead move to using the
"is" operator which is the documented best practice for Python code.

Reviewed-by: Jacob Champion

e39047b5

14 11月, 2018 1 次提交

Add encoding option as condition of finding reusable table (#6151) (#6205) · 2c6567f2

由 Huiliang.liu 提交于 11月 14, 2018

* Add external table encoding option as condition of finding reusable table

Get database default encoding if ENCODING is not set in config file.
Find encoding code by encoding string and then add encoding code as one of
conditions of finding reusable table.

2c6567f2

30 7月, 2018 1 次提交

gpload: exit with os._exit to prevent hang (#5335) · f3e5c093

由 Peifeng Qiu 提交于 7月 30, 2018

gpload test case will run gpload with subprocess, read stdout
and stderr from it and wait for exit. sys.exit in gpload does some
cleanup may cause deadlock between test and gpload. os._exit will
exit immediately, but we need to flush stdout and stderr before
that.

f3e5c093

24 7月, 2018 1 次提交

Fix external schema bug in fast_match (#5324) · b2e38f17

由 Huiliang.liu 提交于 7月 24, 2018

- The results of fast_match SQL don't include shema name, so we need
add shema name to extSchemaTable for fast_match
- Remove locationStr which is unused.

b2e38f17

23 7月, 2018 1 次提交

support fast_match option in gpload config file (#5310) · d240a284

由 Huiliang.liu 提交于 7月 23, 2018

- add fast_match option in gpload config file. If both reuse_tables
and fast_match are true, gpload will try fast match external
table(without checking columns). If reuse_tables is false and
fast_match is true, it will print warning message.

d240a284

23 4月, 2018 1 次提交
- P
  
  Remove customer names from gpload.py · 9c4c7158
  由 Peifeng Qiu 提交于 4月 23, 2018
  
  9c4c7158
03 4月, 2018 1 次提交

Get rid of pg_exttable.fmterrtbl · 8f6fe2d6

由 Adam Lee 提交于 3月 12, 2018

The pg_exttable.fmterrtbl column stored the OID of the error table, but
without an error table it is just set to the OID of the external table.
That is not necessary, there are other columns which indicate if error
logging is enabled. Therefore this column can be removed.

8f6fe2d6

27 3月, 2018 1 次提交

Use hard kill in gpload to avoid unexpected gpfdist hang (#4765) · 83fb63c0

由 Peifeng Qiu 提交于 3月 27, 2018

When gpload finishes its query, it will send SIGTERM to gpfdist.
gpfdist handle SIGTERM with exit(1), which will invoke registered
apr handlers and cleanup all apr resources including apr_pool. If
this happens just during normal destruction of apr_pool in
do_close, gpfdist will hang.

Call _exit in gpfdist to avoid any cleanup handlers, and let gpload
send SIGKILL to perform hard kill.

83fb63c0

26 2月, 2018 1 次提交

fix gpload bug about handling nullas option (#4583) · 2e330960

由 huiliang-liu 提交于 2月 26, 2018

- if the data file contains "\N" as the delimiter, it would not be
recognized properly by gpload
- root cause: gpload replace the quote in nullas option as well as
replace '\' as '\\'
- solution: add quote_no_slash function to handle nullas option

2e330960