- 07 9月, 2020 1 次提交
-
-
由 xiaoxiao 提交于
* fix gpload fail when capital letters in column name in merge mode add double quotations in column names when create staging tables omit distribution key Co-authored-by: NXiaoxiaoHe <hxiaoxiao@vmware.com>
-
- 04 9月, 2020 1 次提交
-
-
由 Peter Eisentraut 提交于
Using exit() requires stdlib.h, which is not included. Use return instead. Also add return type for main(). Reviewed-by: NHeikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: NThomas Munro <thomas.munro@enterprisedb.com> (cherry picked from commit 1c0cf52b)
-
- 03 9月, 2020 5 次提交
-
-
由 Jesse Zhang 提交于
This commit takes advantage of the resource-safety afforded by RelationWrapper by using it as the return type of gpdb::GetRelation(). This allows us to write code like this: auto rel = GetRelation(...); if (!RelIsSupported(rel)) { return -1; } do_stuff(rel); Instead of code like this before the patch: Relation rel = GetRelation(...); if (!RelIsSupported(rel)) { CloseRelation(rel); return -1; } GPOS_TRY { do_stuff(rel); CloseRelation(rel); } GPOS_CATCH_EX(ex) { CloseRelation(rel); GPOS_RETHROW(ex); } GPOS_CATCH_END;
-
由 Jesse Zhang 提交于
-
由 Jesse Zhang 提交于
We're about to introduce a different return type to gpdb::GetRelation in a forthcoming commit. To ease that transition, change Yoda conditions for pointer non-null comparison to the more idiomatic C++ style of using pointers in a boolean context. Also remove one redundant fallback exception.
-
由 Adam Lee 提交于
This will make LISTEN and NOTIFY work on the QD node.
-
由 Bhuvnesh Chaudhary 提交于
For gpdb applicance, there are certain GUCs which are set using gpconfig, but currently it fails as MASTER_DATA_DIRECTORY is not exported. This commit exports MASTER_DATA_DIRECTORY so that gpconfig succeeds. This commit also allows setting DCA_VERSION_FILE to enable testing. Also add a test for the same to ensure that DCA configuration GUCs are set properly on the environment.
-
- 02 9月, 2020 4 次提交
-
-
由 Hubert Zhang 提交于
-
由 Hubert Zhang 提交于
In the test case, there's query like 'insert into t select i from generate_series(1,10) i', the slice of 'generate_series' has the locus of general, so it might be executed in any segment according to the session id and that makes the test flaky. To make it deterministic, we change generate_series to a regular table and filter the data with gp_segment_id. This commit also removes the alternative expect files. Co-authored-by: NGang Xiong <gangx@vmware.com>
-
由 Hubert Zhang 提交于
-
由 Hubert Zhang 提交于
Resource group used to access resGroupSlot in SessionState without lock. This is correct when session only access resGroupSlot by itself. But as we introduced runaway feature, we need to traverse the current session array to find the top consumer session when redzone is reached. This requires: 1. runaway detector should hold shared resgroup lock to avoid resGroupSlot is detached from a session concurrently when redzone is reached. 2. normal session should hold exclusive lock when modifying resGroupSlot in SessionState. Also fix a compile warning. Reviewed-by: NNing Yu <nyu@pivotal.io>
-
- 01 9月, 2020 8 次提交
-
-
由 David Kimura 提交于
This approach special cases gp_segment_id enough to include the column as a distributed column constraint. It also updates direct dispatch info to be aware of gp_segment_id which represents the raw value of the segment where the data resides. This is different than other columns which hash the datum value to decide where the data resides. After this change the following DDL shows Gather Motion from 2 segments on a 3 segment demo cluster. ``` CREATE TABLE t(a int, b int) DISTRIBUTED BY (a); EXPLAIN SELECT gp_segment_id, * FROM t WHERE gp_segment_id=1 or gp_segment_id=2; QUERY PLAN ------------------------------------------------------------------------------- Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..431.00 rows=1 width=12) -> Seq Scan on t (cost=0.00..431.00 rows=1 width=12) Filter: ((gp_segment_id = 1) OR (gp_segment_id = 2)) Optimizer: Pivotal Optimizer (GPORCA) (4 rows) ```
-
由 Heikki Linnakangas 提交于
This is more in line with upstream parallel plans, where the estimates also mean "per worker". NOTE: The rows/tuples/pages in RelOptInfo still represent whole-rel values. That's the only thing that makes sense for join rels, which could have Paths with different locus. This doesn't change the row counts displayed in EXPLAIN output, because previously we divided the row counts stored on the plan nodes with the number of segments, for display purposes. With this patch, that's no longer necessary. You can see the difference in the cost estimates, however. This doesn't affect GPORCA's cost model, and the GPORCA translator has been modified to divide row count estimates in the final plan by the number of segments, to keep the row counts shown in EXPLAIN comparable with the Postgres planner's numbers, and unchanged from previous versions. This includes some changes to GPORCA output files too. Most of the real changes that are not just to plans in queries where GPORCA falls back, are because I added an "ANALYZE int8_tbl" to the int8 test. That affects many test queries that used the int8_tbl table. I added the "ANALYZE int8_tbl" command to make one of the planner tests to produce the same plan as before (I forget which one, unfortunately.). Discussion: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/cGZsAFiRfBE/m/aq6PKj23AwAJReviewed-by: NZhenghua Lyu <zlv@pivotal.io> Reviewed-by: NJinbao Chen <jinchen@pivotal.io>
-
由 Heikki Linnakangas 提交于
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
-
由 Hubert Zhang 提交于
Proxy bgworker will become orphan process after postmaster is dead due to the lack of checking pipe postmaster_alive_fds[POSTMASTER_FD_WATCH]. Epoll this pipe inside proxy bgworker main loop as well. Reviewed-by: NNing Yu <nyu@pivotal.io>
-
由 Hubert Zhang 提交于
When calculating safeChunksThreshold of runaway in resource group, we used to divide by 100 to get the number of safe chunks. This may lead to small chunk numbers to be rounded to zero. Fix it by storing safeChunksThreshold100(100 times bigger than the real safe chunk) and do the computation on the fly. Reviewed-by: NNing Yu <nyu@pivotal.io>
-
由 Jesse Zhang 提交于
This is in no way exhaustive, I'm only changing what seems abundantly obvious and greppable.
-
由 Jesse Zhang 提交于
While we're at it, also add another attribute GPOS_ASSERTS_ONLY. This should help us eliminate a lot of clutter around the code that looks like this: BOOL result = m_cte_consumer_info->Insert(key, GPOS_NEW(m_mp) SCTEConsumerInfo(cte_plan));
-
由 Jesse Zhang 提交于
With this patch, the whole translator compiles warning free. null_ndv was orphaned in commit 25479cf1 ("Fix num_distinct calculation in relcache translator"). coercePathType was dead on arrival in commit cc799db4 ("Fix Relcache Translator to send CoercePath info (#2842)").
-
- 31 8月, 2020 7 次提交
-
-
由 xiong-gang 提交于
When doing 'VACUUM FULL', 'swap_relation_files' updates the pg_class entry but not increase the command counter, so the later 'vac_update_relstats' will inplace update the 'relfrozenxid' and 'relhasindex' of old tuple, when the transaction is interrupted and aborted on the QE after this, the old entry is corrupted. This problem is partially fixed by commit 7f7fa498, this commit seperates the code of sending stats to QD and call it in `vac_update_relstats` instead of update the stats on QE.
-
由 Heikki Linnakangas 提交于
Index Only Scans have not been implemented on Bitmap Indexes, but in certain circumstances, when the query doesn't need any of the attributes from the index, like in "SELECT count(*) from table", the planner may still choose an Index Only Scan. It's debatable if that's actually a planner bug, but we can easily support that limited case. Reviewed-by: NAshwin Agrawal <aashwin@vmware.com>
-
由 Heikki Linnakangas 提交于
If you have a pre-sorted input, like Index Scan, and a DISTINCT clause, the planner would create an invalid plan. A Redistribute Motion node is breaks the ordering of its input, so such a plan cannot be used as input to a Unique node. This is possibly unreachable at the moment, because parse analysis transforms simple DISTINCT queries to GROUP BY (see call to transformDistinctToGroupBy() in transformSelectStmt()). I have not been able to come up with a query that would exercise this codepath; any simple query is transformed to a GROUP BY, and anything more complicated, with window functions or aggregates, don't yield sorted input to the DISTINCT stage. But if you disable the DISTINCT -> GROUP BY transformation in parse analysis, this query caused an assertion before this commit: postgres=# create table distincttest (i int, j int) distributed by (i); CREATE TABLE postgres=# create index on distincttest (j); CREATE INDEX postgres=# set gp_enable_multiphase_agg =off; set enable_hashagg=off; set enable_seqscan=off; set enable_bitmapscan=off; SET SET SET SET postgres=# explain select distinct j from distincttest; FATAL: Unexpected internal error (createplan.c:6871) DETAIL: FailedAssertion("!(numCols >= 0 && numCols <= list_length(pathkeys))", File: "createplan.c", Line: 6871) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded.
-
由 Denis Smirnov 提交于
Glibc implementations are known to return inconsistent results for strcoll() and strxfrm() on many platforms that can cause unpredictable bugs. Because of that PostgreSQL disabled strxfrm() by default since 9.5 at compile time by TRUST_STRXFRM definition. Greenplum has its own mk sort implementation that can also use strxfrm(). Hence mk sort can also be affected by strcoll() and strxfrm() inconsistency (breaks merge joins). That is why strxfrm() should be disabled by default with TRUST_STRXFRM_MK_SORT definition for mk sort as well. We don't use PostgreSQL's TRUST_STRXFRM definition as many users used Greenplum with strxfrm() enabled for mk sort and disabled in PostgreSQL core. Keeping TRUST_STRXFRM_MK_SORT as a separate definition allows these users not to reindex after version upgrade. Reviewed-by: NAsim R P <pasim@vmware.com> Reviewed-by: NHeikki Linnakangas <linnakangash@vmware.com> Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
-
由 xiaoxiao 提交于
fix match column condition to resovle primary key conflict when using the gpload merge mode to import data to the Multi-level partition table fix fail when special char and capital letters in column names Co-authored-by: NXiaoxiaoHe <hxiaoxiao@vmware.com>
-
由 (Jerome)Junfeng Yang 提交于
When QD acquiring sample rows on QE, the QE should only collect required on parent or all inherited tables. Otherwise, the QD may get wrong results for parent table since the inherited tables will overwrite the expected values. And we'll have incorrect results in pg_class and pg_statistic. In `gp_acquire_sample_rows`, the function has three inputs, but somehow, the code loses the last `inherited` argument usage. This is important to distinguish whether the QD need samples for parent only or all inherited tables. On QE, when receiving ANALYZE request through gp_acquire_sample_rows. We should only perform do_analyze_rel for the parent table only or all it's children tables. Because QD will send two acquire sample rows requests to QE. To distinguish the two requests, we check the ctx->inherited value.
-
由 盏一 提交于
The basic idea for only enabling auto-ANALYZE through Master’s autovacuum daemon is to collect pgstat info into Master when executing queries. Start the Master’s autovacuum launcher process. Fire an autovacuum work process for a database on Master when the naptime reaches. Then the autovacuum worker will iterate through all tables/materialized views under a specified database, and execute ANALYZE for tables which reached the analyze threshold. Note the ANALYZE statement issued in the autovacuum worker on Master is the same as executing it through query on QD. ie. The auto-ANALYZE is coordinated by the master and segments do not start it’s own autovacuum launcher and autovacuum worker. More details please refer to src/backend/postmaster/README.auto-ANALYZE. Co-authored-by: NJunfeng(Jerome) Yang <jeyang@pivotal.io>
-
- 28 8月, 2020 4 次提交
-
-
-
由 Heikki Linnakangas 提交于
Commit 0c27e42a changed the way that the gp_acquire_sample_rows() function, called by ANALYZE, collects the sample rows. With the commit, the sample size was not chosen correctly. The sample size is passed to gp_acquire_sample_rows() as an argument, 'targrows', but the function did not pass it down to the do_analyze_rel() function that actually collects the sample. As a result, do_analyze_rel() collected a larger sample, but gp_acquire_sample_rows() only returned the first 'targrows' rows of it to the caller. For example, if you have three segments and the total desired sample size is 3000 rows, gp_acquire_sample_rows() is called with targrows=1000. But do_analyze_rel() nevertheless collected a sample with 3000 rows, but only the first 1000 rows of it were returned to the QD. The end result was that the sample was highly biased towards the physical beginning of table. This adds a test case, which creates and ANALYZEs a table with values 0-99, with 100 copies of each distinct value. The table is populated in order, so there is perfect correlation between the physical order and the values. Before this patch, ANALYZE built a histogram like this for it: regression=# select histogram_bounds from pg_stats s where tablename = 'uniformtest'; histogram_bounds --------------------------------- {0,3,6,10,13,17,20,24,27,34,40} (1 row) After this fix: histogram_bounds ---------------------------------- {0,8,21,32,42,51,60,71,81,89,99} (1 row) Commit 0c27e42a updated the plan in expected output of 'gp_aggregates_costs' test. This reverts it back; the reason it changed was that the statistics were bogus, and now they're good again. I'm not sure which plan actually is better for that query. The cost estimates are not very accurate in either case, but they're inaccurate in different ways. The query actually returns 300000 rows, the estimate with the bogus stats was 463756 rows and with teh correct stats it's 103613.
-
由 Paul Guo 提交于
This is used to avoid "writer segworker group shared snapshot collision on id 153871" kind of error. Pengzhou and I saw this in a real product environment on gpdb 5. Pengzhou suspected that writer gang exits due to gp_vmem_idle_resource_timeout but it exits slowly because of ProcArrayLock contention so the collision happens when a new gang is created. The theory was roughly verified with process core dump when that issue happens - ProcArrayLock contention was found in those core files. Increasing the default gp_snapshotadd_timeout value to tolerate more with the case. We have been optimizing the ProcArrayLock but we can not 100% avoid the contention.
-
由 Hao Wu 提交于
In previous GPDB version, the distribution keys may be changed implicitly when creating a unique index on a hash-distributed empty table. ```SQL create table foo(a int, b int) distributed by(a); create unique index on foo(b); -- now, foo is hash distributed by b, not by a ``` It might be useful(maybe) to avoid changing the distribution keys. However, on the other side, it's crazy if the user doesn't notice the NOTICE message like, "NOTICE: updating distribution policy to match new UNIQUE index". What's worse, this behavior could bring data inconsistency. See, ```SQL create table foo(a int, b int) distributed by(a); insert into foo select i,i from generate_series(1,5)i; create table foopart (i int4, j int4) distributed by (i) partition by range (i) (start (1) end (3) every (1)); create unique index on foopart_1_prt_1 (j); insert into foopart values(1,2),(2,1); ``` The data inconsistency is ``` gpadmin=# select gp_segment_id, * from foopart_1_prt_1; gp_segment_id | i | j ---------------+---+--- 1 | 1 | 2 (1 row) gpadmin=# select * from foo f, foopart_1_prt_1 p where f.a = p.j; a | b | i | j ---+---+---+--- (0 rows) ``` Implicitly changing the distribution keys is not very useful, but harmful. This PR disables changing the distribution keys when creating a unique index. Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
-
- 27 8月, 2020 1 次提交
-
-
由 mkiyama 提交于
-
- 26 8月, 2020 3 次提交
-
-
由 Heikki Linnakangas 提交于
This introduces two new functions cdb_prepare_path_for_sorted_agg() and cdb_prepare_path_for_hashed_agg(), to sort and/or redistribute the input to an Agg node, in single-phase aggregation. Previously, the logic was in the callers in planner.c. This is a nice cleanup now, but is particularly helpful with the PostgreSQL v12 merge which will introduce more codepaths that create Agg nodes. Encapsulating the logic in functions reduces the duplication. Parallel grouping is currently disabled alogether, but if it wasn't, we should be using these functions when creating parallel grouping paths, too. There's one almost user-visible change here, which explains the change in 'gp_aggregates' expected output. If a sorted Gather Motion is created, we now use the path keys needed for the grouping (root->grouped_pathkeys), rather than the pathkeys of the subpath (subpath->pathkeys), as the merge key for the Gather Motion. The grouped_pathkeys must be a subset of the subpath's keys, but the subpath might have extra keys that are not needed for the Agg. Don't bother to preserve the order of those extra keys, mostly because it's more convenient in the code to not bother with it, but in principle it also saves some CPU cycles. Reviewed-by: NGang Xiong <gxiong@pivotal.io>
-
由 Xiaoran Wang 提交于
* Fix url_curl on MacOS Fix libcurl can not read data from gpfdist on MacOS But gpfdist with a pipe can not work on macos as flock(2) which is used in gfile.c is not supported on MacOS.
-
由 Heikki Linnakangas 提交于
If you have plan_cache_mode=auto, which is the default, never try to generate "generic" plans. GPORCA doesn't support Param nodes, so it will always fall back to the Postgres planner. What happened without this patch was that the backend code would compare the cost of the custom plan generated with GPORCA with the cost of a generic plan generated with the Postgres planner, and that doesn't make much sense because the GPORCA has a very different cost model from the Postgres planner. No test, because it would be quite tedious and fragile to write one, and the code change seems simple enough. I bumped into this while hacking on PR #10676, which changes the Postgres planner's cost model. There's a test in 'direct_dispatch' for the generic plan generation, and it started to fail because with the planner cost model changes, the Postgres planner's generic plan started to look cheaper than the custom plan generated with GPORCA. So we do have some test coverage for this, although accidental. Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
-
- 25 8月, 2020 6 次提交
-
-
由 David Yozie 提交于
-
由 Mel Kiyama 提交于
* docs - add information on upgrading to PostGIS 2.5.4 Upgrade instructions 2.1.5 to different versions of 2.5.4 * docs - upgrade to PostGIS 2.5.4 review comments * docs - more review comment updates. reorder upgrade sections. clarify removing PostGIS package, is for removing the gppkg * docs - minor edit * docs - review updates - more emphasis on removing PostGIS from a database deleting objects. -Create separate paragraph in Upgrading section. -Add warning in Removing PostGIS section * docs - minor review comment update * small edits Co-authored-by: NDavid Yozie <dyozie@pivotal.io>
-
由 Chris Hajas 提交于
Commit 445fc7cc hardened some parts of analyzedb. However, it missed a couple of cases. 1) When the statement to get the modcount from the pg_aoseg table failed due to a dropped table, the transaction was also terminated. This caused further modcount queries to fail and while those tables were analyzed, it would error and not properly record the mod count. Therefore, we now restart the transaction when it errors. 2) If the table is dropped and then recreated while analyzedb is running (or some other mechanism that results in the table being successfully analyzed, but the pg_aoseg table did not exist during the initial check), the logic to update the modcount may fail. Now, we skip the update for the table if this occurs. In this case, the modcount would not be recorded and the next analyzedb run will consider the table modified (or dirty) and re-analyze it, which is the desired behavior.
-
由 Heikki Linnakangas 提交于
It would sometimes fail like this: --- /tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/combocid.out 2020-08-25 03:14:48.314831054 +0000 +++ /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/combocid.out 2020-08-25 03:14:48.326832158 +0000 @@ -66,7 +66,7 @@ FETCH ALL FROM c; ctid | cmin | foobar | distkey -------+------+--------+--------- - (0,1) | 0 | 1 | + (0,1) | 1 | 1 | (0,2) | 1 | 2 | (0,5) | 0 | 333 | (3 rows) I was able to reproduce that locally, by inserting a random delay in the SeqNext() function.
-
由 Zhenghua Lyu 提交于
This commit implements the same feature for planner as the PR https://github.com/greenplum-db/gpdb/pull/10679. This commit does not implement the group-by feature in PR 10679. The following commit message is almost the same as PR 10679. This approach special cases gp_segment_id enough to include the column as a distributed column constraint. It also updates direct dispatch info to be aware of gp_segment_id which represents the raw value of the segment where the data resides. This is different than other columns which hash the datum value to decide where the data resides. After this change the following DDL shows Gather Motion from 2 segments on a 3 segment demo cluster. ``` CREATE TABLE t(a int, b int) DISTRIBUTED BY (a); EXPLAIN SELECT gp_segment_id, * FROM t WHERE gp_segment_id=1 or gp_segment_id=2; QUERY PLAN ----------------------------------------------------------------------------------- Gather Motion 2:1 (slice1; segments: 2) -> Seq Scan on t Filter: ((gp_segment_id = 1) OR (gp_segment_id = 2)) Optimizer: Postgres query optimizer (4 rows) ```
-
由 Divyesh Vanjare 提交于
* allow NEq predicates with lossy casts for PS * Updating some mdp files that were missing the PartitionTypes XML tag * Update partition_pruning result file for planner * adding new testcase for NEq * move icg tests from partition_pruning to gporca * add test for checking multilevel range/list PS Co-authored-by: NHans Zeller <hzeller@vmware.com>
-