提交 · 8190ed4059b7be6c8c0edde146afd1ba3157f14b · Greenplum / Gpdb

24 6月, 2020 1 次提交

Remove lockfile from mainUtils · 8190ed40

由 Tyler Ramer 提交于 6月 18, 2020

[Lockfile](https://pypi.org/project/lockfile/) has not been maintained
since around 2015. Further, the functionality it provided seems poor - a
review of the code indicated that it used the presence of the PID file
itself as the lock - in Unix, using a file's existence followed by a
creation is not atomic, so a lock could be prone to race conditions.

The lockfile package also did not clean up after itself - a process
which was destroyed unexpectedly would not clear the created locks, so
some faulty logic was added to mainUtils.py, which checked to see if a
process with the same PID as the lockfile's creator was running. This
is obviously failure prone, as a new process might be assigned the same
PID as the old lockfile's owner, without actually being the same process.

(Of note, the SIG_DFL argument to os.kill() is not a signal at all, but
rather of type signal.handler. It appears that the python cast this
handler to the int 0, which, according to man 2 kill, leads to no signal
being sent, but existance and permission checks are still performed. So
it is a happy accident that this code worked at all)

This commit removes lockfile from the codebase entirely.

It also adds a "PIDLockFile" class which provides an atomic-guarenteed
lock via the mkdir and rmdir commands on Unix - thus, it is not safely
portable to Windows, but this should not be an issue as only Unix-based
utilities use the "simple_main()" function.

PIDLockFile provides API compatible classes to replace most of the
functionality from lockfile.PidLockFile, but does remove any timeout
logic as it was not used in any meaningful sense - a hard-coded timeout
of 1 second was used, but an immediate result of if the lock is held is
sufficient.

PIDLockFile also includes appropriate __enter__, __exit__, and __del__
attributes, so that, should we extend this class in the future, with
syntax is functional, and __del__ calls release, so a process reaped
unexpectedly should still clean its own locks as part of the garbage
collection process.
Authored-by: NTyler Ramer <tramer@pivotal.io>

8190ed40

23 6月, 2020 2 次提交

Update PyGreSQL from pre-release to 5.2.0 · 0ee8dfb5

由 Tyler Ramer 提交于 6月 22, 2020

PyGreSQL 5.2.0 which contains the fixes submitted and referenced in
cb8d54a6 was released on June 21, 2020.

Update the build process to use this tagged release rather than a
pre-release hash
Authored-by: NTyler Ramer <tramer@vmware.com>

0ee8dfb5

Update psutil from 4.0.0 to 5.7.0 · b365d229

由 Tyler Ramer 提交于 6月 19, 2020

psutil 4.0.0 is quite old, and only lists support for python 3.4. We'll
need support for python 3.6 and 3.8 as we update to python3.
Authored-by: NTyler Ramer <tramer@pivotal.io>

b365d229

18 6月, 2020 1 次提交

Document steps for installing pygresql for OSS database build · ab1f69ec

由 Tyler Ramer 提交于 6月 16, 2020

PyGreSQL may now be installed via pip or via Ubuntu apt.

Update the travis pipeline as well, using submodules to pull the
necessary python dependencies. Thus, they are removed from PIP as well.
Authored-by: NTyler Ramer <tramer@pivotal.io>

ab1f69ec

17 6月, 2020 4 次提交

Use master of pygresql due to bug in 5.1.2 · cb8d54a6

由 Tyler Ramer 提交于 6月 15, 2020

We encounted a bug in escaping dbname and connection options in pygresql
5.1.2, which we submitted a patch for here:
https://github.com/PyGreSQL/PyGreSQL/pull/40

This has been merged, but it will take time to be added to a tagged
release. For this reason, we have downloaded the source using this
commit,
https://github.com/PyGreSQL/PyGreSQL/commit/b1e040e989b5b1b75f42c1103562bfe8f09f93c3
to install.
Co-authored-by: NTyler Ramer <tramer@pivotal.io>
Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>

cb8d54a6

Close short lived connections · bc35b6b2

由 Tyler Ramer 提交于 5月 14, 2020

Due to refactor of dbconn and newer versions of pygresql, using
`with dbconn.connect() as conn` no longer attempts to close a
connection, even if it did prior. Instead, this syntax uses the
connection itself as context and, as noted in execSQL, overrides the
autocommit functionality of execSQL.

Therefore, close the connection manually to ensure that execSQL is
auto-commited, and the connection is closed.
Co-authored-by: NTyler Ramer <tramer@pivotal.io>
Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>

bc35b6b2

Refactor dbconn · 330db230

由 Tyler Ramer 提交于 5月 13, 2020

One reason pygresql was previously modified was that it did not handle
closing a connection very gracefully. In the process of updating
pygresql, we've wrapped the connection it provides with a
ClosingConnection function, which should handle gracefully closing the
connection when the "with dbconn.connect as conn" syntax is used.

This did, however, illustrate issues where a cursor might have been
created as the result of a dbconn.execSQL() call, which seems to hold
the connection open if not specifically closed.

It is therefore necessary to remove the ability to get a cursor from
dbconn.execSQL(). To highlight this difference, and to ensure that
future calls to this library is easy to use, I've cleaned up and
clarified the dbconn execution code, to include the following features.

- dbconn.execSQL() closes the cursor as part of the function. It returns
  no rows
- functions dbconn.query() is added, which behaves like dbconn.execSQL()
  except that it now returns a cursor
- function dbconn.execQueryforSingleton() is renamed
  dconn.querySingleton()
- function dbconn.execQueryforSingletonRow() is renamed
  dconn.queryRow()
Authored-by: NTyler Ramer <tramer@pivotal.io>

330db230

Update PyGreSQL from 4.0.0 to 5.1.2 · f5758021

由 Tyler Ramer 提交于 5月 26, 2020

This commit updates pygresql from 4.0.0 to 5.1.2, which requires
numerous changes to take advantages of the major result syntax change
that pygresql5 implemented. Of note, cursors or query objects
automatically cast returned values as appropriate python types - list of
ints, for example, instead of a string like "{1,2}". This is the bulk of
the changes.

Updating to pygresql 5.1.2 provides numerous benfits, including the
following:

- CVE-2018-1058 was addressed in pygresql 5.1.1

- We can save notices in the pgdb module, rather than relying on importing
the pg module, thanks to the new "set_notices()"

- pygresql 5 supports python3

- Thanks to a change in the cursor, using a "with" syntax guarentees a
  "commit" on the close of the with block.

This commit is a starting point for additional changes, including
refactoring the dbconn module.

Additionally, since isolation2 uses pygresql, some pl/python scripts
were updated, and isolation2 SQL output is further decoupled from
pygresql. The output of a psql command should be similar enough to
isolation2's pg output that minimal or no modification is needed to
ensure gpdiff can recognize the output.
Co-Authored-by: NTyler Ramer <tramer@pivotal.io>
Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>

f5758021

08 6月, 2020 1 次提交

Retire guc gp_session_role (#9396) · f6297b96

由 Paul Guo 提交于 6月 08, 2020

Use guc gp_role only now and replace the functionality of guc gp_session_role with it
also. Previously we have both gucs. The difference of the two gucs are (copied
from code comment):

 * gp_session_role
 *
 * - does not affect the operation of the backend, and
 * - does not change during the lifetime of PostgreSQL session.
 *
 * gp_role
 *
 * - determines the operating role of the backend, and
 * - may be changed by a superuser via the SET command.

This is not friendly for coding. For example, You could find Gp_role and
Gp_session_role are set as GP_ROLE_DISPATCH on Postmaster & many aux processes
on all nodes (even QE nodes) in a cluster, so you can see that to differ from
QD postmaster and QE postmaster, current gpdb uses an additional -E option in
postmaster arguments. These makes developers confusing when writing role branch
related code given we have three related variables.  Also some related code is
even buggy now (e.g. 'set gp_role' even FATAL quits).

With this patch we just have gp_role now. Some changes which might be
interesting in the patch are:

1. For postmaster, we should specify '-c gp_role=' (e.g. via pg_ctl argument) to
   determine the role else we assume the utility role.

2. For stand-alone backend, utility role is enforced (no need to specify by
   users).

3. Could still connect QE/QD nodes using utility mode with PGOPTIONS, etc as
   before.

4. Remove the '-E' gpdb hacking and align the '-E' usage with upstream.

5. Move pm_launch_walreceiver out of the fts related shmem given the later is
   not used on QE.
Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
Reviewed-by: NGang Xiong <gxiong@pivotal.io>
Reviewed-by: NHao Wu <gfphoenix78@gmail.com>
Reviewed-by: NYandong Yao <yyao@pivotal.io>

f6297b96

03 6月, 2020 1 次提交
- W
  
  Add "FILL_MISSING_FIELDS" option for gpload. · cb76c301
  由 Wen Lin 提交于 6月 03, 2020
  
  cb76c301
29 5月, 2020 2 次提交

Fix possible build and install errors · d4efd3c9

由 ggbq 提交于 1月 09, 2020

In most cases, the variable LN_S is 'ln -s', however, the LN_S can
be changed to 'cp -pR' if the configure finds the file system does
not support symbolic links. It would be incompatible when linking
a subdir path to a relative path.

cd to subdir first before linking a file.

d4efd3c9

Using gp_add_segment to register mirror in catalog. (#9974) · f7965d48

由 Hubert Zhang 提交于 5月 29, 2020

When introducing a new mirror, we need two steps:
1. start mirror segment
2. update gp_segment_configuration catalog

Previously gp_add_segment_mirror will be called to update
the catalog, but dbid is chosen by get_availableDbId() which
cannot ensure to be the same dbid in internal.auto.conf.
Reported by issue9837 
Reviewed-by: NPaul Guo <pguo@pivotal.io>

f7965d48

19 5月, 2020 1 次提交
- A
  
  Fix analyzedb and gpcheckcat issues due to pg_exttable removal · 044c4c0d
  由 Adam Lee 提交于 5月 07, 2020
  
  044c4c0d
18 5月, 2020 6 次提交
- N
  Revert "gpexpand: exclude master-only tables from the template" · cb3142ab
  由 Ning Yu 提交于 5月 18, 2020
```
This reverts commit 1d04cab0.
```
  cb3142ab
- N
  Revert "Updates according to PR comments" · 158d2b1f
  由 Ning Yu 提交于 5月 18, 2020
```
This reverts commit 43b95607.
```
  158d2b1f
- N
  Revert "Exclude files with the --exclude-from option" · c5ae2cf9
  由 Ning Yu 提交于 5月 18, 2020
```
This reverts commit b0b0b958.
```
  c5ae2cf9
- N
  Revert "Transfer meta-only index files instead of empty ones" · 7054ed2b
  由 Ning Yu 提交于 5月 18, 2020
```
This reverts commit effae659.
```
  7054ed2b
- N
  Revert "Retire the disableSystemIndexes option of SegmentStart" · 1b68687d
  由 Ning Yu 提交于 5月 18, 2020
```
This reverts commit bebc77af.
```
  1b68687d
- N
  Revert "gpexpand: fix the template creation" · d72466a4
  由 Ning Yu 提交于 5月 18, 2020
```
This reverts commit f91e2d81.
```
  d72466a4
15 5月, 2020 1 次提交

Fix the gpload error that have no attribute "staging_table" or "fast_path" · 524f3105

由 Wen Lin 提交于 5月 15, 2020

while gpload is loading data if the configure file contains "error_table" and doesn't contain "preload", an error of no attribute "staging_table" or "fast_path"  occurs.

524f3105

14 5月, 2020 9 次提交

Remove gpsys1 · eee440e6

由 Tyler Ramer 提交于 5月 12, 2020

I'm not quite sure of the purpose of this utility, nor, apparently, is
any readme or historical repo.

Apart from a small fix provided in
commit 71d67305,
there has been no modification to this file since at least 2008. More
importantly, I'm not quite sure of any reasonable use for this file. The
supported platforms are only linux, darwin, or sunos5, and the listed
use, of printing the memory size in bytes, is trivial on any of those
systems without resorting to some python script that wraps a command line
call.

Given that it hasn't been updated since 2008, it's still compatible with
some ancient version of python, which means that it's yet another file to
upgrade to python 3 - in this case, let's drop the program, rather than
bother upgrading it.
Authored-by: NTyler Ramer <tramer@pivotal.io>

eee440e6

gpexpand: fix the template creation · f91e2d81

由 Ning Yu 提交于 5月 14, 2020

The pg_partition_oid_index of template0 is used as a template of empty
indices, its path, however, is not fixed, we need to determine it at
runtime.

f91e2d81

N
Retire the disableSystemIndexes option of SegmentStart · bebc77af
由 Ning Yu 提交于 4月 22, 2020
```
It is no loner needed, the correct approach is to install meta-only
index files on the new segments.
```
bebc77af

Transfer meta-only index files instead of empty ones · effae659

由 Ning Yu 提交于 4月 22, 2020

An empty b-tree index file is not empty, it contains only the meta page.
By transfer meta-only index files to the new segments, they can be
launched directly without the "ignore_system_indexes" setting, and we do
not need an extra relaunch of the new segments.

We use base/13199/5112 as the template of meta-only index files, it is
pg_partition_oid_index of template0.

effae659

Exclude files with the --exclude-from option · b0b0b958

由 Ning Yu 提交于 4月 21, 2020

Which was introduced to exclude a large amount of paths.

Also changed the excluding logic of './db_dumps' and './promote'.  They
were excluded only when an empty 'excludePaths' was specified by the
caller, this is weird, so I changed the logic to always exclude these
two paths.

b0b0b958

Updates according to PR comments · 43b95607

由 Ning Yu 提交于 4月 21, 2020

- be careful when creating placeholders of the master-only files in the
  template, raise an error if they already exist;
- increase code readability slightly;

43b95607

gpexpand: exclude master-only tables from the template · 1d04cab0

由 Ning Yu 提交于 2月 08, 2020

Gpexpand creates new primary segments by first creating a template from
the master datadir and then copying it to the new segments.  Some
catalog tables are only meaningful on master, such as
gp_segment_configuration, their content are then cleared on each new
segment with the "delete from ..." commands.

This works but is slow because we have to include the content of the
master-only tables in the archive, distribute them via network, and
clear them via the slow "delete from ..." commands -- the "truncate"
command is fast but it is disallowed on catalog tables as filenode must
not be changed for catalog tables.

To make it faster we now exclude these tables from the template
directly, so less data are transferred and there is no need to "delete
from" them explicitly.

1d04cab0

gpexpand: cleanup new segments in parallel · 857763ae

由 Ning Yu 提交于 2月 08, 2020

When cleaning up the master-only files on the new segments we used to do
the job one by one, when there are tens or hundreds of segments it can
be very slow.

Now we cleanup in parallel.

857763ae

gppylib: remove duplicated entries in MASTER_ONLY_TABLES · c0b05f8d

由 Ning Yu 提交于 1月 22, 2020

Removed the duplicated 'gp_segment_configuration' entry in the
MASTER_ONLY_TABLES list.  Also sort the list in alphabetic order to
prevent dulicates in the future.

c0b05f8d

12 5月, 2020 1 次提交

[skip-ci] gpload: improve error message (#10076) · 32eadd31

由 Peifeng Qiu 提交于 5月 12, 2020

gpload in the latest windows client package requires VS redistributable
package. Output more meaningful message if pg.py fails to load.

32eadd31

05 5月, 2020 1 次提交

Remove vendored python · fdb5a10e

由 Tyler Ramer 提交于 4月 27, 2020

This commit removes any reference to vendored python, formerly installed
in $GPHOME/ext/python.

Unvendoring python means that a system python of 2.7 is required.

In order to make this possible, several sub-fixes or testing scope fixes
are required:

- Python requirements should be installed globally using pip
- References to PYTHONHOME are removed
- PYTHONPATH becomes "$GPHOME/lib/python:${PYTHONPATH}"

GCC is no longer overridden as part of gpAux makefile process.

- Previously, gpAux Makefile overrode the $CC variable with the value
  "gcc". This obviously breaks convention which itself is a problem, but
  it is also broken because the top level Makefile and configure DO
  respect a CC variable being set.

- Setting CC="gcc" also means that gcc binary must be part of the user's
  path. This isn't a requirement or guarantee to compiling, so why keep
  this behavior?

- However, python packages should be compiled with the same GCC version
  that compiled system python - thus, we unset the "CC" variable when
  installing additional python libraries.
  Specifically, the configure args used to compile python are saved and
  re-used to compile libraries when the python setup.py build process is
  used. So if system python has different compiler flags than might be
  allowed in a newer version of gcc, the build of the libraries fails.
  This is specificaly of note when compiling python libraries in SLES,
  where the system python compiled with GCC 4.8 uses the
  `-fstack-clash-protection` flag, which is replaced by
  `-fstack-protector` in newer GCC versions. Thus, the configure args
   passed cause a failure to compile if a newer gcc version is used
   with "unrecognized command line option" error.

This does make significant improvements to simplify the code building
and testing framework:

- Patchelf requirements go away, as virtualenv is no longer necessary
- There is no need to copy system or other python into
  $GPHOME/etc/python

This commit does not address any of the following:

- Unvendoring individual python libraries, like psutil, pygresql, or
  yaml
- Updating any python code to work with python newer than 2.7
Co-authored-by: NTyler Ramer <tramer@pivotal.io>
Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>

fdb5a10e

15 4月, 2020 1 次提交
- D
  Fix typos in documentation · d9faedac
  由 Daniel Gustafsson 提交于 4月 15, 2020
```
Various typos spotted in internal in-tree documentation.
```
  d9faedac
03 4月, 2020 1 次提交
- R
  removed wrong errormsg which shows on Windows in gpload.py (#9867) · 4d386383
  由 Ryan Zhang 提交于 4月 03, 2020
```
Co-authored-by: NRyan <ryan@chapterx.com>
```
  4d386383
20 3月, 2020 1 次提交

(

Enable external table's error log to be persistent for ETL. (#9757) · 04fdd0a6

由 (Jerome)Junfeng Yang 提交于 3月 20, 2020

For ETL user scenarios, there are some cases that may frequently create
and drop the same external table. And once the external table gets dropped.
All errors stored in the error log will lose.

To enable error log persistent for external with the same
"dbname"."namespace"."table".
Bring in "error_log_persistent" external table option. If create the
external table with `OPTIONS (error_log_persistent 'true')` and `LOG ERROR`,
the external's error log will be name as "dbid_namespaceid_tablename"
under "errlogpersistent" directory.
And drop external table will ignore to delete the error log.

Since GPDB 5, 6 still use pg_exttable's options to mark
LOG ERRORS PERSISTENTLY, so keep the ability for loading from
OPTIONS(error_log_persistent 'true').

Create separate `gp_read_persistent_error_log` function to read
persistent error log.
If the external table gets deleted, only the namespace owner
has permission to delete the error log.

Create separate `gp_truncate_persistent_error_log` function to delete
persistent error log.
If the external table gets deleted. Only the namespace owner has
permission to delete the error log.
It also supports wildcard input to delete error logs
belong to a database or whole cluster.

If drop an external table create with `error_log_persistent`. And then
create the same "dbname"."namespace"."table" external table without
persistent error log. It'll write errors to the normal error log.
The persistent error log still exists.
Reviewed-by: NHaozhouWang <hawang@pivotal.io>
Reviewed-by: NAdam Lee <ali@pivotal.io>

04fdd0a6

18 3月, 2020 1 次提交

gpinitsystem: update catalog with correct hostname · 03c7d557

由 Jamie McAtamney 提交于 3月 10, 2020

Previously, gpintsystem was incorrectly filling the hostname field of each
segment in gp_segment_configuration with the segment's address. This commit
changes it to correctly resolve hostnames and update the catalog accordingly.
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>

03c7d557

14 3月, 2020 2 次提交

Harden analyzedb when a table is dropped · 48be5628

由 Ashuka Xue 提交于 3月 09, 2020

Previously, analyzedb would error out and fail if a table was dropped
during analyzedb. Now, we silently skip dropped tables when determining
the tables to analyze.

48be5628

Fix gpinitsystem when setting password for numeric username · f188ecb5

由 Adam Berlin 提交于 3月 09, 2020

gpinitsystem did not quote the username while performing ALTER USER. When the username
is a numeric value the postgres parser gets upset - unless the username is quoted.

See here for more details:

https://www.postgresql.org/docs/9.4/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

- SQL identifiers and key words must begin with a letter (a-z, but also letters with
  diacritical marks and non-Latin letters) or an underscore (_).

- Also, there is a second kind of identifier: the delimited identifier or quoted
  identifier. It is formed by enclosing an arbitrary sequence of characters in
  double-quotes (")

- use variable interpolation provided by psql to properly quote user-provided values.

- use RETVAL to perform testing due to Commit d7b7a40aCo-authored-by: NJacob Champion <pchampion@pivotal.io>

f188ecb5

13 3月, 2020 1 次提交

Fix analyzedb with config file to work with partitioned tables · d1611944

由 Chris Hajas 提交于 3月 09, 2020

Previously, running analyzedb with an input file (`analyzedb -f
<config_file`) containing a root partition would fail as we did not
properly populate the list of leaf partitions. The logic in analyzedb
assumes that we enumerate leaf partitions from the root partition that
the user had input (either from the command line or from an input file).
While we did this properly when the table was passed in from the command
line, we looked for the table name rather than the schema-qualifed table
for input files.

This would cause partitioned heap tables to fail when writing the
report/status files at the end, and would cause analyzedb to not track
DML changes in partitioned AO tables. Now, we properly check for the
schema-qualified table name.

d1611944

10 3月, 2020 1 次提交

Remove gp_elog() function. · 4f0a5ced

由 Heikki Linnakangas 提交于 3月 10, 2020

It was only used for one message in gprecoverseg, and it doesn't seem
important.

The second argument to the function didn't do anything, since the removal
of email and SNMP alerts in commit 65822b80. And the NULL checks for the
arguments were pointless, because the function was marked as strict. But
rather than clean those up, let's just remove it altogether.
Reviewed-by: NAsim R P <apraveen@pivotal.io>
Reviewed-by: NJimmy Yih <jyih@pivotal.io>

4f0a5ced

09 3月, 2020 1 次提交

Collect 'modcount' correctly, now that AO segments are managed in QE. · 67321b1c

由 Heikki Linnakangas 提交于 3月 09, 2020

'modcount' is not kept up-to-date in the QD node anymore, so we need to
sum it up across all the segments. The analyzedb tests on concourse master
pipeline were failing because modcount was always caming out as 0.

67321b1c