- 23 1月, 2020 14 次提交
-
-
由 Hubert Zhang 提交于
Function with `SET search_path = t1` will change the search_path inside function but restore it back after function finishes. After search_path is restored on QD, we should also sync it to all the cached QEs. Reviewed-by: NWeinan WANG <wewang@pivotal.io>
-
由 Ning Yu 提交于
Removed the duplicated 'gp_segment_configuration' entry in the MASTER_ONLY_TABLES list. Also sort the list in alphabetic order to prevent dulicates in the future.
-
由 Ning Yu 提交于
Gpexpand creates new primary segments by first creating a template from the master datadir and then copying it to the new segments. Some catalog tables are only meaningful on master, such as gp_segment_configuration, their content are then cleared on each new segment with the "delete from ..." commands. This works but is slow because we have to include the content of the master-only tables in the archive, distribute them via network, and clear them via the slow "delete from ..." commands -- the "truncate" command is fast but it is disallowed on catalog tables as filenode must not be changed for catalog tables. To make it faster we now exclude these tables from the template directly, so less data are transferred and there is no need to "delete from" them explicitly.
-
由 Ning Yu 提交于
In the gpexpand behave tests we used to have the same name for multiple scenarios, now we give them different and descriptive names. Also correct some bad indents.
-
由 Ning Yu 提交于
The --exclude option is a gpdb specific option for pg_basebackup, it could be used to specify a path to exclude from the backup archive, the option can be provided multiple times to exclude multiple paths. We used to allow at most 255 excludes, this worked well in the past, but now we plan to use it to exclude the master-only catalog tables from the segment template of gpexpand, we may exceed this limit easily as many catalog tables are per-database, so when there are enough databases we can have thousands of or even more paths to exclude. Increase the limit to 65535, this should be enough in practice. In the future we may want to remove the limit entirely, but for now we could just stick on a hard coded value.
-
由 ggbq 提交于
split_rows() scans tuples from T and route them to new parts (A, B) based on A's or B's constraints. If T has one or more dropped columns before its partition key, T's partition key would have a different attribute number from its new parts. In this case, the constraints choose a wrong column which can cause bad behaviors. To fix it, each tuple iteration should reconstruct the partition tuple slot and assign it to econtext before ExecQual calls. The reconstruction process can happen once or twice because we assume A, B might have two different tupdescs. One bad behavior, rows are split into wrong partitions. Reproduce: ```sql DROP TABLE IF EXISTS users_test; CREATE TABLE users_test ( id INT, dd TEXT, user_name VARCHAR(40), user_email VARCHAR(60), born_time TIMESTAMP, create_time TIMESTAMP ) DISTRIBUTED BY (id) PARTITION BY RANGE (create_time) ( PARTITION p2019 START ('2019-01-01'::TIMESTAMP) END ('2020-01-01'::TIMESTAMP), DEFAULT PARTITION extra ); /* Drop useless column dd for some reason */ ALTER TABLE users_test DROP COLUMN dd; /* Forgot/Failed to split out new partitions beforehand */ INSERT INTO users_test VALUES(1, 'A', 'A@abc.com', '1970-01-01', '2020-01-01 12:00:00'); INSERT INTO users_test VALUES(2, 'B', 'B@abc.com', '1980-01-01', '2020-01-02 18:00:00'); INSERT INTO users_test VALUES(3, 'C', 'C@abc.com', '1990-01-01', '2020-01-03 08:00:00'); /* New partition arrives late */ ALTER TABLE users_test SPLIT DEFAULT PARTITION START ('2020-01-01'::TIMESTAMP) END ('2021-01-01'::TIMESTAMP) INTO (PARTITION p2020, DEFAULT PARTITION); /* * - How many new users already in 2020? * - Wow, no one. */ SELECT count(1) FROM users_test_1_prt_p2020; ``` Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io> Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
由 Mark Sliva 提交于
We update the pg_hba.conf file with replication entries for each hostname/address to enable cross-subnet cluster expansion. There are no tests for this change, but they can be added at a later time. Co-authored-by: NJacob Champion <pchampion@pivotal.io> Co-authored-by: NAdam Berlin <aberlin@pivotal.io> Co-authored-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io> Co-authored-by: NKalen Krempely <kkrempely@pivotal.io> Co-authored-by: NDavid Krieger <dkrieger@pivotal.io> Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>
-
由 Mark Sliva 提交于
We add a cli_cross_subnet job that creates a cross_subnet cluster, and then runs the cross_subnet behave tests. It tests that replication works for each of the affected cross-subnet utilities. We provision 2 ccp clusters in 2 different subnets, and the gpinitsystem task creates a cluster in which every segment pair (including master/standby) replicates across subnets. Co-authored-by: NJacob Champion <pchampion@pivotal.io> Co-authored-by: NAdam Berlin <aberlin@pivotal.io> Co-authored-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io> Co-authored-by: NKalen Krempely <kkrempely@pivotal.io> Co-authored-by: NDavid Krieger <dkrieger@pivotal.io> Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>
-
由 Mark Sliva 提交于
The four CM utilities gpinitsystem, gpinitstandby, gpaddmirrors, and gpmovemirrors now have the relevant pg_hba.conf entries to allow WAL replication to mirrors from their respective primaries across subnets. There are two parts to this commit: 1). modify the CM utilities to add the pg_hba.conf entries to allow WAL replication to mirrors across a subnet. 2). test the relevant CM utilities across subnets The previous pg_hba.conf replication entry: 'host replication $USER samenet trust' does not allow WAL replication connections across subnets. We keep this entry in order to support single-host development. We then add one replication line for each primary and mirror interface address to new primaries and mirrors to allow this. It looks like: 'host replication $USER $IP_ADDRESS trust' or when HBA_HOSTNAMES=1 'host replication $USER $HOSTNAME trust' Further, if there is ever a failover and subsequent promotion, replication connections can be made to the newly promoted primary from the host on which the previous primary failed, because those addresses get copied over to the new mirror during a pg_base_backup. We also add similar logic to support cross-subnet replication between the master and standby. This behavior is tested in the cross_subnet behave tests. The cross_subnet behave tests assert that the replication connection is valid by manually making the connection in addition to relying on segments being synchronized, as a way to ensure that the pg_hba.conf file is being used. Co-authored-by: NJacob Champion <pchampion@pivotal.io> Co-authored-by: NAdam Berlin <aberlin@pivotal.io> Co-authored-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io> Co-authored-by: NKalen Krempely <kkrempely@pivotal.io> Co-authored-by: NDavid Krieger <dkrieger@pivotal.io> Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>
-
由 Mark Sliva 提交于
The interface addresses used for replication will be scanned using this new utility we added called ifaddrs that returns all of the interface addresses separated by newlines. As an internal utility, this will be installed into $GPHOME/libexec. There is no Python 2 library that provides this functionality, so we add it ourselves. Also add a configure dependency on getifaddrs and inet_ntop, which are now required to build a functioning GPDB system. As far as we can tell, the other headers and functions are already handled through other configure checks. Co-authored-by: NJacob Champion <pchampion@pivotal.io> Co-authored-by: NAdam Berlin <aberlin@pivotal.io> Co-authored-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io> Co-authored-by: NKalen Krempely <kkrempely@pivotal.io> Co-authored-by: NDavid Krieger <dkrieger@pivotal.io> Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>
- 22 1月, 2020 4 次提交
-
-
由 Heikki Linnakangas 提交于
The planner code assumed that a foreign path is only created for a base relation. But that is not a valid assumption, a FDW may push down joins and aggregates, too, which will be represented by the join or upper rels in the planner. The code in create_foreign_path() tried to look up the EXECUTE ON attribute from the RelOptInfo->ftEntry field, but that was only set for base relations, which caused a crash when pushing down a join or an aggregate. To fix, propagate the 'exec_location' field to join and upper rels, just like the fdwroutine and serverid fields are propagated. Unfortunately this creates a rather large diff, those fields are set in many places. That has the potential for bugs-of-omission, as we merge with upstream, but I don't see a better way. If such a bug happens, though, I believe the consequence is just a graceful error from the planner, or a missed push-down, so I think we can live with it. Fixes github issue https://github.com/greenplum-db/gpdb/issues/9209Reviewed-by: NAdam Lee <ali@pivotal.io>
-
由 Ning Yu 提交于
We used to dump and save the icw databases at end of the icw_gporca_centos6 job, it usually took more than 2 hours to complete, so it took even longer time to trigger the jobs that depends on this dump, such as the gpexpand and pg_upgrade jobs. Now we generate the dump at end of the icw_planner_centos6, which is much faster. Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/wl3UjtACTzE/xLO8r6wYAAAJReviewed-by: NPaul Guo <pguo@pivotal.io>
-
由 Paul Guo 提交于
Reviewed-by: NNing Yu <nyu@pivotal.io> Reviewed-by: NGang Xiong <gxiong@pivotal.io> Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io> Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
由 Weinan WANG 提交于
This PR bring Multi-DQA MPP execution to master again after 9.6 merged but a new strategy introduced. A new node, `TUPLE_SPLIT`, divides each input tuple to `n` output tuples (`n` is the number of DQA exprs). Each output tuple only contains one DQA expr, an `AGGEXPRID` which indicate which DQA function handle this tuple, and all GROUP BY expr, so that we can remove multiple DQA in one pipeline. After previous work, a normal two-stage aggregate without distinct attaches on top. Co-authored-by: NAdam Li <ali@pivotal.io> Inspired by: Heikki Linnakangas <heikki.linnakangas@iki.fi> Review by: Gang Xiong <gxiong@pivotal.io>
-
- 21 1月, 2020 8 次提交
-
-
由 Heikki Linnakangas 提交于
Some GPDB-added system tables and colums in upstream catalogs were missing CATALOG_VARLEN markings or they were wrong. It doesn't cause any ill effect, but it's a hazard if someone tries to access the fields through the Form_pg_* struct. Let's be tidy. Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
-
由 Heikki Linnakangas 提交于
We had optimized this piece of code in Greenplum, but in PostgreSQL 9.6, commit aa2387e2, PostgreSQL made a similar optimization. In the 9.6 merge, I picked up the PostgreSQL version, but left the Greenplum version in a commented out block, with the plan to do some performance testing to see if we can switch to the PostgreSQL version now. I did that performance testing now, and it seems that the old GPDB version was about the same speed as the new PostgreSQL version. I used this to test it: -- generate test data create table tstest (distkey int4, ts timestamp) distributed by (distkey); insert into tstest select 1, g from generate_series(now(), now()+ '1 year', '1 second') g; vacuum tstest; -- test query \timing on select min(ts::text) from tstest; The query took about 15 seconds on my laptop, and 'perf' says that about 10% of the CPU time was spent in EncodeDateTime, with both versions. Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
-
由 Heikki Linnakangas 提交于
I'm just about to remove it, but let's fix it first so that the we have the fixed version in the git history, in case someone wants to revisit this.
-
由 Zhenghua Lyu 提交于
Statement `Insert on conflict do update` will invoke update on segments. If the on conflict update modifies the distkeys of the table, this would lead to wrong data distribution. This commit avoid this issue by raising error when transformInsertStmt, if it finds that the on conflict update will touch the distribution keys of the table. Fixes github issue: https://github.com/greenplum-db/gpdb/issues/9444
-
由 Zhenghua Lyu 提交于
Statement `insert on conflict do update` may invoke ExecUpdate on segments, so it should be treated as update statement for the lock mode issue. Fixes github issue https://github.com/greenplum-db/gpdb/issues/9449
-
由 Richard Guo 提交于
This is a work based on multi-phase aggregation in master. The idea is to perform grouping sets aggregation in partial phase and then perform normal aggregation in final phase. In partial phase, we attach a GROUPINGSET_ID to each transvalue. In final phase, we include this GROUPINGSET_ID at the head of sort keys and group keys for group aggregation. This will ensure correctness in the case of NULLs.
-
由 Ning Yu 提交于
Python 2 has reached the end-of-life on Jan 1st, 2020, many upstream python modules are in the progress of dropping python 2 support, their newer versions may not work for us. We have encountered several such kind of issues on the pipeline this year, to make our lifes easier we now stick the modules to the python2 compatible versions. For now we only stick the modules used by the concourse scripts, later we might want to stick more which are used by our utility scripts.
-
- 20 1月, 2020 5 次提交
-
-
由 LittleWuCoding 提交于
fix duplicate PG_SETMASK() statement in postgres.c
-
由 Richard Guo 提交于
This patch performs a little cosmetic refactoring on how group_ids are computed. Also in test case `olap_plans`, I believe we meant to execute `analyze olap_test_single` as in this patch. Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
由 Ning Yu 提交于
Below resources are only used by the "Publish Server Builds" job, which is only enabled on prod pipelines, so we should remove these resources from non-prod pipelines otherwise they will cause errors when set-pipeline. - server-build-centos6 - server-build-centos7 - server-build-ubuntu18.04
-
由 Wang Hao 提交于
Because plannode id start from 0 instead of 1 now, the expected max nid in instr_in_shmem_terminate test changed from 5 to 4.
-
由 Zhenghua Lyu 提交于
Commit ce81ca79 introduces a compiler warning. This commit removes the warning.
-
- 19 1月, 2020 1 次提交
-
-
由 Wang Hao 提交于
The tmid should be of the same value among the cluster. It increases only on gpdb cluster full restart. Tmid, gp_session_id, and gp_command_count are put together to uniquely identify a single query execution, monitoring agents such as gpperfmon relies on this uniqueness. In 58c7833d introduced a different way for gpmon_gettmid(), it brings in a problem that on master and segments the tmid may be different. This commit fixes the problem. Reviewed-by: NNing Yu <nyu@pivotal.io>
-
- 17 1月, 2020 5 次提交
-
-
由 Mel Kiyama 提交于
* docs - update DISCARD command. DISCARD ALL not supported -Update DISCARD command -Update pgbouncer.ini file that uses DISCARD ALL This will be backported to 6X_STABLE and 5X_STABLE * docs - DISCARD ALL not supported - review updates
-
由 Weinan WANG 提交于
In upstream, it does not create a new pathkey in convert_subquery_pathkeys function. It also raises an issue in gpdb, so revert it. After that, we need to handle a common case that a gather motion should keep data order however subqueryscan can not provide. e.g. create a view with an ORDER BY: ```CREATE VIEW v AS SELECT va, vn FROM sourcetable ORDER BY vn;``` and query it: ```SELECT va FROM v_sourcetable;``` In the planer, a subqueryscan is created at the top of the view. In upstream, it is fine, even it does not have a path key, the data is physically sorted well. But for us, we have a Gather Motion upon subqueryscan. The data is sorted on each segment, later Gather Motion has no indication of how to merge tuple. To deal with this problem, just push down the gather motion under a subqueryscan, if a subqueryscan does not have path key but its subpath has. fix issue: #8987
-
-
由 Ashwin Agrawal 提交于
Since segments don't have partition information, need to dispatch names of all the tables for GRANT to segments. Doing so scribbles on the existing stmt. This is problematic when the stmt is stored in planned cache. We shouldn't scribble on plan cache stmt, hence make copy and then only modify it. This all can go away, once segments have partition information. Fixes https://github.com/greenplum-db/gpdb/issues/9428. Co-authored-by: NTaylor Vesely <tvesely@pivotal.io> Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io> Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
-
由 mkiyama 提交于
-
- 16 1月, 2020 1 次提交
-
-
由 Ashwin Agrawal 提交于
On primary segment, in normal mode backend performs fsync for AO table and doesn't delegate the work to checkpointer. Hence, doesn't need to register or forget fsync request for AO table. TRUNCATE command currently shares same code for heap and AO via ExecuteTruncate() -> heap_truncate_one_rel(). It writes generic file truncate wal record and registers the request with checkpointer process. In normal mode, UNLINK request is registered with checkpointer process and backend doesn't perform the unlink, irrespective of AO or heap for base file and hence this works fine. But during crash recovery, as can't determine based on file truncate record, it's heap or AO file, fsync request is registered. So, if mdunlink() skips sending forget fsync request for AO and instead direclty unlinks the file, it causes PANIC with "could not fsync file.....No such file or directory". To avoid this situation and have defensive code, better to always forget relation fsync request during crash recovery in mdunlink(), irrespective of storage type. This makes the system resilient in presence of any such kind of issues. As only way to recovery from "could not fsync file.....No such file or directory" during crash recovery is reset xlogs which is very dangerous and causes data loss. In normal mode continuing to avoid mdunlink() to forget fsync request for AO, just to avoid overwhelming the fsync request queue. In longer term, mostly better to have separate code for truncate for AO and heap. Not to use generic file truncate record for both will be better. But irrespective seems better as stated above to have this change. Reviewed-by: NAsim R P <apraveen@pivotal.io> Reviewed-by: NPaul Guo <pguo@pivotal.io>
-
- 15 1月, 2020 2 次提交
-
-
由 Heikki Linnakangas 提交于
We failed to print the start/end markers of the list of child nodes of Sequence. Fixes github issue https://github.com/greenplum-db/gpdb/issues/9410. Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
-
由 Heikki Linnakangas 提交于
The functions for PartitionRule in outfuncs.c and readfuncs.c were missing the 'parisdefault' field. That was harmless, because those functions are in fact unused; we never materialize PartitionRule structs to disk. To fix, add the field, and to reduce the chance of similar bugs in the future and to reduce the amount of duplication, remove binary versions of these functions from outfast.c/readfast.c. Also remove the binary versions of _readPartition() and _outPartition() while we're at it. No fields were missing from those, but less duplicated code is good. Discussion: https://github.com/greenplum-db/gpdb/issues/9390Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
-