提交 · revert-9481-gpexpand_exclude_master_only_files · Greenplum / Gpdb

23 1月, 2020 14 次提交

N
Revert "basebackup: increase max count of --exclude args" · 7cd757d6
由 Ning Yu 提交于 1月 23, 2020
```
This reverts commit 34803183.
```
7cd757d6
N
Revert "gpexpand: correct scenario names and indents" · 24680df7
由 Ning Yu 提交于 1月 23, 2020
```
This reverts commit cff087dc.
```
24680df7
N
Revert "gpexpand: exclude master-only tables from the template" · 0348f712
由 Ning Yu 提交于 1月 23, 2020
```
This reverts commit 62d53c21.
```
0348f712
N
Revert "gppylib: remove duplicated entries in MASTER_ONLY_TABLES" · 844d4bd8
由 Ning Yu 提交于 1月 23, 2020
```
This reverts commit e046859f.
```
844d4bd8

GUC should be synchronized after change/restore. · c6079933

由 Hubert Zhang 提交于 1月 23, 2020

Function with `SET search_path = t1` will change the
search_path inside function but restore it back after
function finishes.

After search_path is restored on QD, we should also sync
it to all the cached QEs.
Reviewed-by: NWeinan WANG <wewang@pivotal.io>

c6079933

gppylib: remove duplicated entries in MASTER_ONLY_TABLES · e046859f

由 Ning Yu 提交于 1月 22, 2020

Removed the duplicated 'gp_segment_configuration' entry in the
MASTER_ONLY_TABLES list.  Also sort the list in alphabetic order to
prevent dulicates in the future.

e046859f

gpexpand: exclude master-only tables from the template · 62d53c21

由 Ning Yu 提交于 1月 20, 2020

Gpexpand creates new primary segments by first creating a template from
the master datadir and then copying it to the new segments.  Some
catalog tables are only meaningful on master, such as
gp_segment_configuration, their content are then cleared on each new
segment with the "delete from ..." commands.

This works but is slow because we have to include the content of the
master-only tables in the archive, distribute them via network, and
clear them via the slow "delete from ..." commands -- the "truncate"
command is fast but it is disallowed on catalog tables as filenode must
not be changed for catalog tables.

To make it faster we now exclude these tables from the template
directly, so less data are transferred and there is no need to "delete
from" them explicitly.

62d53c21

gpexpand: correct scenario names and indents · cff087dc

由 Ning Yu 提交于 1月 20, 2020

In the gpexpand behave tests we used to have the same name for multiple
scenarios, now we give them different and descriptive names.

Also correct some bad indents.

cff087dc

basebackup: increase max count of --exclude args · 34803183

由 Ning Yu 提交于 1月 20, 2020

The --exclude option is a gpdb specific option for pg_basebackup, it
could be used to specify a path to exclude from the backup archive, the
option can be provided multiple times to exclude multiple paths.

We used to allow at most 255 excludes, this worked well in the past, but
now we plan to use it to exclude the master-only catalog tables from the
segment template of gpexpand, we may exceed this limit easily as many
catalog tables are per-database, so when there are enough databases we
can have thousands of or even more paths to exclude.

Increase the limit to 65535, this should be enough in practice. In the
future we may want to remove the limit entirely, but for now we could
just stick on a hard coded value.

34803183

Bugfix: rows might be split into wrong partitions · 101922f1

由 ggbq 提交于 1月 22, 2020

split_rows() scans tuples from T and route them to new parts (A, B) based
on A's or B's constraints. If T has one or more dropped columns before its
partition key, T's partition key would have a different attribute number
from its new parts. In this case, the constraints choose a wrong column
which can cause bad behaviors.

To fix it, each tuple iteration should reconstruct the partition tuple
slot and assign it to econtext before ExecQual calls. The reconstruction
process can happen once or twice because we assume A, B might have two
different tupdescs.

One bad behavior, rows are split into wrong partitions. Reproduce:

```sql
DROP TABLE IF EXISTS users_test;

CREATE TABLE users_test
(
  id          INT,
  dd          TEXT,
  user_name   VARCHAR(40),
  user_email  VARCHAR(60),
  born_time   TIMESTAMP,
  create_time TIMESTAMP
)
DISTRIBUTED BY (id)
PARTITION BY RANGE (create_time)
(
  PARTITION p2019 START ('2019-01-01'::TIMESTAMP) END ('2020-01-01'::TIMESTAMP),
  DEFAULT PARTITION extra
);

/* Drop useless column dd for some reason */
ALTER TABLE users_test DROP COLUMN dd;

/* Forgot/Failed to split out new partitions beforehand */
INSERT INTO users_test VALUES(1, 'A', 'A@abc.com', '1970-01-01', '2020-01-01 12:00:00');
INSERT INTO users_test VALUES(2, 'B', 'B@abc.com', '1980-01-01', '2020-01-02 18:00:00');
INSERT INTO users_test VALUES(3, 'C', 'C@abc.com', '1990-01-01', '2020-01-03 08:00:00');

/* New partition arrives late */
ALTER TABLE users_test SPLIT DEFAULT PARTITION START ('2020-01-01'::TIMESTAMP) END ('2021-01-01'::TIMESTAMP)
 INTO (PARTITION p2020, DEFAULT PARTITION);

/*
 * - How many new users already in 2020?
 * - Wow, no one.
 */
SELECT count(1) FROM users_test_1_prt_p2020;
```
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

101922f1

cross-subnet: make gpexpand support cross-subnet expansion · cdd1e934

由 Mark Sliva 提交于 1月 16, 2020

We update the pg_hba.conf file with replication entries for each
hostname/address to enable cross-subnet cluster expansion. There are no tests
for this change, but they can be added at a later time.
Co-authored-by: NJacob Champion <pchampion@pivotal.io>
Co-authored-by: NAdam Berlin <aberlin@pivotal.io>
Co-authored-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>
Co-authored-by: NDavid Krieger <dkrieger@pivotal.io>
Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>

cdd1e934

cross-subnet: add its pipeline job · b74a20a9

由 Mark Sliva 提交于 1月 16, 2020

We add a cli_cross_subnet job that creates a cross_subnet cluster, and then
runs the cross_subnet behave tests. It tests that replication works for each of
the affected cross-subnet utilities.

We provision 2 ccp clusters in 2 different subnets, and the gpinitsystem task
creates a cluster in which every segment pair (including master/standby)
replicates across subnets.
Co-authored-by: NJacob Champion <pchampion@pivotal.io>
Co-authored-by: NAdam Berlin <aberlin@pivotal.io>
Co-authored-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>
Co-authored-by: NDavid Krieger <dkrieger@pivotal.io>
Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>

b74a20a9

cross-subnet: fix replication on cross-subnet Greenplum Clusters · 79637980

由 Mark Sliva 提交于 1月 16, 2020

The four CM utilities gpinitsystem, gpinitstandby, gpaddmirrors, and
gpmovemirrors now have the relevant pg_hba.conf entries to allow WAL
replication to mirrors from their respective primaries across subnets.

There are two parts to this commit:
1). modify the CM utilities to add the pg_hba.conf entries to
allow WAL replication to mirrors across a subnet.
2). test the relevant CM utilities across subnets

The previous pg_hba.conf replication entry:
    'host replication $USER samenet trust'
does not allow WAL replication connections across subnets. We keep this entry
in order to support single-host development. We then add one replication line
for each primary and mirror interface address to new primaries and mirrors to
allow this. It looks like:
    'host replication $USER $IP_ADDRESS trust'
    or when HBA_HOSTNAMES=1
    'host replication $USER $HOSTNAME trust'
Further, if there is ever a failover and subsequent promotion,
replication connections can be made to the newly promoted primary from the host
on which the previous primary failed, because those addresses get copied over
to the new mirror during a pg_base_backup. We also add similar logic to support
cross-subnet replication between the master and standby. This behavior is
tested in the cross_subnet behave tests.

The cross_subnet behave tests assert that the replication connection is valid
by manually making the connection in addition to relying on segments being
synchronized, as a way to ensure that the pg_hba.conf file is being used.
Co-authored-by: NJacob Champion <pchampion@pivotal.io>
Co-authored-by: NAdam Berlin <aberlin@pivotal.io>
Co-authored-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>
Co-authored-by: NDavid Krieger <dkrieger@pivotal.io>
Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>

79637980

cross-subnet: add ifaddrs utility · 15a30510

由 Mark Sliva 提交于 1月 16, 2020

The interface addresses used for replication will be scanned using this new
utility we added called ifaddrs that returns all of the interface addresses
separated by newlines. As an internal utility, this will be installed into
$GPHOME/libexec. There is no Python 2 library that provides this functionality,
so we add it ourselves.

Also add a configure dependency on getifaddrs and inet_ntop, which are now
required to build a functioning GPDB system. As far as we can tell, the
other headers and functions are already handled through other configure
checks.
Co-authored-by: NJacob Champion <pchampion@pivotal.io>
Co-authored-by: NAdam Berlin <aberlin@pivotal.io>
Co-authored-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>
Co-authored-by: NDavid Krieger <dkrieger@pivotal.io>
Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>

15a30510

22 1月, 2020 4 次提交

Fix pushdown of joins and aggregates to foreign servers. · 05681977

由 Heikki Linnakangas 提交于 1月 22, 2020

The planner code assumed that a foreign path is only created for a
base relation. But that is not a valid assumption, a FDW may push down
joins and aggregates, too, which will be represented by the join or
upper rels in the planner. The code in create_foreign_path() tried
to look up the EXECUTE ON attribute from the RelOptInfo->ftEntry field,
but that was only set for base relations, which caused a crash when
pushing down a join or an aggregate.

To fix, propagate the 'exec_location' field to join and upper rels, just
like the fdwroutine and serverid fields are propagated. Unfortunately this
creates a rather large diff, those fields are set in many places. That has
the potential for bugs-of-omission, as we merge with upstream, but I don't
see a better way. If such a bug happens, though, I believe the consequence
is just a graceful error from the planner, or a missed push-down, so I
think we can live with it.

Fixes github issue https://github.com/greenplum-db/gpdb/issues/9209Reviewed-by: NAdam Lee <ali@pivotal.io>

05681977

ci: dump icw dbs with the planner job · 1f2e9d48

由 Ning Yu 提交于 1月 22, 2020

We used to dump and save the icw databases at end of the
icw_gporca_centos6 job, it usually took more than 2 hours to complete,
so it took even longer time to trigger the jobs that depends on this
dump, such as the gpexpand and pg_upgrade jobs.

Now we generate the dump at end of the icw_planner_centos6, which is
much faster.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/wl3UjtACTzE/xLO8r6wYAAAJReviewed-by: NPaul Guo <pguo@pivotal.io>

1f2e9d48

Remove gang_id in Gang. It's not needed now. (#9471) · 791637af

由 Paul Guo 提交于 1月 22, 2020

Reviewed-by: NNing Yu <nyu@pivotal.io>
Reviewed-by: NGang Xiong <gxiong@pivotal.io>
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

791637af

Multiple Distinct-qualified Aggregation(Multi-DQA) MPP execution method · 0138eed4

由 Weinan WANG 提交于 1月 22, 2020

This PR bring Multi-DQA MPP execution to master again after 9.6 merged but a new strategy introduced. A new node, `TUPLE_SPLIT`, divides each input tuple to `n` output tuples (`n` is the number of DQA exprs). Each output tuple only contains one DQA expr, an `AGGEXPRID` which indicate which DQA function handle this tuple, and all GROUP BY expr, so that we can remove multiple DQA in one pipeline. After previous work, a normal two-stage aggregate without distinct attaches on top.
Co-authored-by: NAdam Li <ali@pivotal.io>
Inspired by: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Review by: Gang Xiong <gxiong@pivotal.io>

0138eed4

21 1月, 2020 8 次提交

Fix CATALOG_VARLEN markings in header files. · 43690c89

由 Heikki Linnakangas 提交于 1月 21, 2020

Some GPDB-added system tables and colums in upstream catalogs were missing
CATALOG_VARLEN markings or they were wrong. It doesn't cause any ill
effect, but it's a hazard if someone tries to access the fields through
the Form_pg_* struct. Let's be tidy.
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

43690c89

Remove obsolete optimized code from EncodeDateTime. · 8661e51e

由 Heikki Linnakangas 提交于 1月 21, 2020

We had optimized this piece of code in Greenplum, but in PostgreSQL 9.6,
commit aa2387e2, PostgreSQL made a similar optimization. In the 9.6
merge, I picked up the PostgreSQL version, but left the Greenplum version
in a commented out block, with the plan to do some performance testing to
see if we can switch to the PostgreSQL version now.

I did that performance testing now, and it seems that the old GPDB version
was about the same speed as the new PostgreSQL version. I used this to
test it:

    -- generate test data
    create table tstest  (distkey int4, ts timestamp) distributed by (distkey);
    insert into tstest
      select 1, g from generate_series(now(), now()+ '1 year', '1 second') g;
    vacuum tstest;

    -- test query
    \timing on
    select min(ts::text) from tstest;

The query took about 15 seconds on my laptop, and 'perf' says that about
10% of the CPU time was spent in EncodeDateTime, with both versions.
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

8661e51e

Fix commented-out code. · b49993a6

由 Heikki Linnakangas 提交于 1月 20, 2020

I'm just about to remove it, but let's fix it first so that the we have
the fixed version in the git history, in case someone wants to revisit
this.

b49993a6

Add sanity check for on conflict update to avoid wrong data distribution · 8bd2f1c2

由 Zhenghua Lyu 提交于 1月 21, 2020

Statement `Insert on conflict do update` will invoke update on segments.
If the on conflict update modifies the distkeys of the table, this would
lead to wrong data distribution.

This commit avoid this issue by raising error when transformInsertStmt,
if it finds that the on conflict update will touch the distribution keys of
the table.

Fixes github issue: https://github.com/greenplum-db/gpdb/issues/9444

8bd2f1c2

Fix potential global deadlock for upsert · 893c5293

由 Zhenghua Lyu 提交于 1月 21, 2020

Statement `insert on conflict do update` may invoke ExecUpdate
on segments, so it should be treated as update statement for the
lock mode issue.

Fixes github issue https://github.com/greenplum-db/gpdb/issues/9449

893c5293

Implementing multi-phase grouping sets. (#9219) · 4e00f481

由 Richard Guo 提交于 1月 21, 2020

This is a work based on multi-phase aggregation in master. The idea is
to perform grouping sets aggregation in partial phase and then perform
normal aggregation in final phase.

In partial phase, we attach a GROUPINGSET_ID to each transvalue. In
final phase, we include this GROUPINGSET_ID at the head of sort keys and
group keys for group aggregation. This will ensure correctness in the
case of NULLs.

4e00f481

N
Revert "ci: remove server-build-* resources from non-prod pipelines" · ea975b6e
由 Ning Yu 提交于 1月 21, 2020
```
This reverts commit 6f9729c1.
```
ea975b6e

ci: stick python modules to py2 compatible versions · 68a82378

由 Ning Yu 提交于 1月 21, 2020

Python 2 has reached the end-of-life on Jan 1st, 2020, many upstream
python modules are in the progress of dropping python 2 support, their
newer versions may not work for us. We have encountered several such
kind of issues on the pipeline this year, to make our lifes easier we
now stick the modules to the python2 compatible versions. For now we
only stick the modules used by the concourse scripts, later we might
want to stick more which are used by our utility scripts.

68a82378

20 1月, 2020 5 次提交

L
fix duplicate PG_SETMASK() statement · 2038f518
由 LittleWuCoding 提交于 1月 20, 2020
```
fix duplicate PG_SETMASK() statement in postgres.c
```
2038f518

A little cosmetic refactoring on aggregation. · 053e4d9f

由 Richard Guo 提交于 1月 20, 2020

This patch performs a little cosmetic refactoring on how group_ids are
computed. Also in test case `olap_plans`, I believe we meant to execute
`analyze olap_test_single` as in this patch.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

053e4d9f

ci: remove server-build-* resources from non-prod pipelines · 6f9729c1

由 Ning Yu 提交于 12月 26, 2019

Below resources are only used by the "Publish Server Builds" job, which
is only enabled on prod pipelines, so we should remove these resources
from non-prod pipelines otherwise they will cause errors when
set-pipeline.

- server-build-centos6
- server-build-centos7
- server-build-ubuntu18.04

6f9729c1

Fix instr_in_shmem_terminate test failure (#9458) · 4b28ebfc

由 Wang Hao 提交于 1月 20, 2020

Because plannode id start from 0 instead of 1 now,
the expected max nid in instr_in_shmem_terminate test
changed from 5 to 4.

4b28ebfc

Fix compiler warnings. · 9d8d8fa5

由 Zhenghua Lyu 提交于 1月 20, 2020

Commit ce81ca79 introduces a compiler warning.
This commit removes the warning.

9d8d8fa5

19 1月, 2020 1 次提交

Fix gpperfmon tmid inconsistent across master and segments (#9450) · c9989a93

由 Wang Hao 提交于 1月 19, 2020

The tmid should be of the same value among the cluster.
It increases only on gpdb cluster full restart.
Tmid, gp_session_id, and gp_command_count are put together to
uniquely identify a single query execution, monitoring agents
such as gpperfmon relies on this uniqueness.

In 58c7833d introduced a different
way for gpmon_gettmid(), it brings in a problem that on master
and segments the tmid may be different.
This commit fixes the problem.
Reviewed-by: NNing Yu <nyu@pivotal.io>

c9989a93

17 1月, 2020 5 次提交

docs - update DISCARD command. DISCARD ALL not supported (#9432) · cbb49f75

由 Mel Kiyama 提交于 1月 16, 2020

* docs - update DISCARD command. DISCARD ALL not supported

-Update DISCARD command
-Update pgbouncer.ini file that uses DISCARD ALL

This will be backported to 6X_STABLE and 5X_STABLE

* docs - DISCARD ALL not supported - review updates

cbb49f75

Revert create pathkey in `convert_subquery_pathkeys` and push down gather motion · ce81ca79

由 Weinan WANG 提交于 1月 17, 2020

In upstream, it does not create a new pathkey in convert_subquery_pathkeys function. It also raises an issue in gpdb, so revert it.

After that, we need to handle a common case that a gather motion should keep data order however subqueryscan can not provide.

e.g. create a view with an ORDER BY:
```CREATE VIEW v AS SELECT va, vn FROM sourcetable ORDER BY vn;```
and query it:
```SELECT va FROM v_sourcetable;```

In the planer, a subqueryscan is created at the top of the view. In upstream, it is fine, even it does not have a path key, the data is physically sorted well. But for us, we have a Gather Motion upon subqueryscan. The data is sorted on each segment, later Gather Motion has no indication of how to merge tuple.

To deal with this problem, just push down the gather motion under a subqueryscan, if a subqueryscan does not have path key but its subpath has.

fix issue: #8987

ce81ca79

M
docs - update ddboost plugin doc to match backup/restore 1.16 · afab8250
由 mkiyama 提交于 1月 16, 2020
```
doc PR https://github.com/pivotal/gp-docs-backup-restore/pull/5
```
afab8250

Copy GrantStmt for dispatch · 93225fa0

由 Ashwin Agrawal 提交于 1月 16, 2020

Since segments don't have partition information, need to dispatch
names of all the tables for GRANT to segments. Doing so scribbles on
the existing stmt. This is problematic when the stmt is stored in
planned cache. We shouldn't scribble on plan cache stmt, hence make
copy and then only modify it.

This all can go away, once segments have partition information.

Fixes https://github.com/greenplum-db/gpdb/issues/9428.
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

93225fa0

M

docs - fix log_min_error_statement description. Wrong default. · 50dff423
由 mkiyama 提交于 1月 16, 2020

50dff423

16 1月, 2020 1 次提交

ForgetRelationFsyncRequests in mdunlink for AO during crash recovery · 1223ffac

由 Ashwin Agrawal 提交于 1月 15, 2020

On primary segment, in normal mode backend performs fsync for AO table
and doesn't delegate the work to checkpointer. Hence, doesn't need to
register or forget fsync request for AO table. TRUNCATE command
currently shares same code for heap and AO via ExecuteTruncate() ->
heap_truncate_one_rel(). It writes generic file truncate wal record
and registers the request with checkpointer process. In normal mode,
UNLINK request is registered with checkpointer process and backend
doesn't perform the unlink, irrespective of AO or heap for base file
and hence this works fine.

But during crash recovery, as can't determine based on file truncate
record, it's heap or AO file, fsync request is registered. So, if
mdunlink() skips sending forget fsync request for AO and instead
direclty unlinks the file, it causes PANIC with "could not fsync
file.....No such file or directory".

To avoid this situation and have defensive code, better to always
forget relation fsync request during crash recovery in mdunlink(),
irrespective of storage type. This makes the system resilient in
presence of any such kind of issues. As only way to recovery from
"could not fsync file.....No such file or directory" during crash
recovery is reset xlogs which is very dangerous and causes data
loss. In normal mode continuing to avoid mdunlink() to forget fsync
request for AO, just to avoid overwhelming the fsync request queue.

In longer term, mostly better to have separate code for truncate for
AO and heap. Not to use generic file truncate record for both will be
better. But irrespective seems better as stated above to have this
change.
Reviewed-by: NAsim R P <apraveen@pivotal.io>
Reviewed-by: NPaul Guo <pguo@pivotal.io>

1223ffac

15 1月, 2020 2 次提交

Fix EXPLAIN JSON/XML/YAML format for Sequence nodes. · af4e4cd9

由 Heikki Linnakangas 提交于 1月 15, 2020

We failed to print the start/end markers of the list of child nodes of
Sequence.

Fixes github issue https://github.com/greenplum-db/gpdb/issues/9410.
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

af4e4cd9

Fix node out/read routines for Partition and PartitionRule structs. · 514994a1

由 Heikki Linnakangas 提交于 1月 15, 2020

The functions for PartitionRule in outfuncs.c and readfuncs.c were missing
the 'parisdefault' field. That was harmless, because those functions are
in fact unused; we never materialize PartitionRule structs to disk.

To fix, add the field, and to reduce the chance of similar bugs in the
future and to reduce the amount of duplication, remove binary versions of
these functions from outfast.c/readfast.c.

Also remove the binary versions of _readPartition() and _outPartition()
while we're at it. No fields were missing from those, but less duplicated
code is good.

Discussion: https://github.com/greenplum-db/gpdb/issues/9390Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

514994a1