- 24 9月, 2019 3 次提交
-
-
由 Heikki Linnakangas 提交于
Printing the slice information makes sense for Init Plans, which are dispatched separately, before the main query. But not so much for other Sub Plans, which are just part of the plan tree; there is no dispatching or motion involved at such SubPlans. The SubPlan might *contain* Motions, but we print the slice information for those Motions separately. The slice information was always just the same as the parent node's, which adds no information, and can be misleading if it makes the reader think that there is inter-node communication involved in such SubPlans.
-
由 Ashwin Agrawal 提交于
gp_tablespace_with_faults test writes no-op record and waits for mirror to replay the same before deleting the tablespace directories. This step fails sometime in CI and causes flaky behavior. The is due to existing code behavior in startup and walreceiver process. If primary writes big (means spanning across multiple pages) xlog record, flushes only partial xlog record due to XLogBackgroundFlush() but restarts before commiting the transaction, mirror only receives partial record and waits to get complete record. Meanwhile after recover, no-op record gets written in place of that big record, startup process on mirror continues to wait to receive xlog beyond previously received point to proceed further. Hence, as temperory workaround till the actual code problem is not resolved and to avoid failures for this test, switch xlog before emitting no-op xlog record, to have no-op record at far distance from previously emitted xlog record.
-
由 Jimmy Yih 提交于
When gp_use_legacy_hashops GUC was set, CTAS would not assign the legacy hash class operator to the new table. This is because CTAS goes through a different code path and uses the first operator class of the SELECT's result when no distribution key is provided.
-
- 23 9月, 2019 2 次提交
-
-
由 Zhenghua Lyu 提交于
In Greenplum, when estimating costs, most of the time we are in a global view, but sometimes we should shift to a local view. Postgres does not suffer from this issue because everything is in one single segment. The function `estimate_hash_bucketsize` is from postgres and it plays a very important role in the cost model of hash join. It should output a result based on locally view. However, the input parameters like, rows in a table, and ndistinct of the relation, are all taken from a global view (from all segments). So, we have to do some compensation for it. The logic is: 1. for broadcast-like locus, the global ndistinct is the same as the local one, we do the compensation by `ndistinct*=numsegments`. 2. for the case that hash key collcated with locus, on each segment, there are `ndistinct/numsegments` distinct groups, so no need to do the compensation. 3. otherwise, the locus has to be partitioned and not collocated with hash keys, for these cases, we first estimate the local distinct group number, and then do do the compensation. Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
-
由 Heikki Linnakangas 提交于
Fixes github issue https://github.com/greenplum-db/gpdb/issues/8621
-
- 21 9月, 2019 1 次提交
-
-
由 Heikki Linnakangas 提交于
I've been wondering for some time why we have disabled constructing Init Plans in queries that are planned in QEs, like in SPI queries that run in user-defined functions. So I removed the diff vs upstream in build_subplan() to see what happens. It turns out it was because we always ran the ExtractParamsFromInitPlans() function in QEs, to get the InitPlan values that the QD sent with the plan, even for queries that were not dispatched from the QD but planned locally. Fix the call in InitPlan to only call ExtractParamsFromInitPlans() for queries that were actually dispatched from the QD, and allow QE-local queries to build Init Plans. Include a new test case, for clarity, even though there were some existing ones that incidentally covered this case.
-
- 20 9月, 2019 3 次提交
-
-
由 Paul Guo 提交于
* Ship modified python module subprocess32 again subprocess32 is preferred over subprocess according to python documentation. In addition we long ago modified the code to use vfork() against fork() to avoid some "Cannot allocate memory" kind of error (false alarm though - memory is actually sufficient) on gpdb product environment that is usually with memory overcommit disabled. And we compiled and shipped it also but later it was just compiled but not shipped somehow due to makefile change (maybe a regression). Let's ship it again. * Replace subprocess with our own subprocess32 in python code.
-
由 Paul Guo 提交于
1. checkpoint_segments does not exist since pg9.5. Cleaning up the code that includes it. 2. GPTest.pm should be cleaned up in src/test/regress/GNUmakefile Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
-
由 Sambitesh Dash 提交于
- The corresponding ORCA PR is : https://github.com/greenplum-db/gporca/pull/533 - Change GUC value OPTIMIZER_UNEXPECTED_FAIL so that we log only unexpected failures. Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io> Co-authored-by: NSambitesh Dash <sdash@pivotal.io>
-
- 19 9月, 2019 5 次提交
-
-
由 Gang Xiong 提交于
-
由 xiong-gang 提交于
There was a hang like this: when one QE errors out before 'SetupInterconnect', QD will keep waiting for the incoming connections to be established and doesn't check the error message from dispatcher. Other QEs are finished and hang in function 'waitOnOutbound'. Co-authored-by: NAsim R P <apraveen@pivotal.io> Co-authored-by: NGang Xiong <gxiong@pivotal.io> Co-authored-by: NNing Yu <nyu@pivotal.io> (cherry picked from commit b2101122)
-
由 Shreedhar Hardikar 提交于
-
由 Ning Yu 提交于
These tests use fault injection for testing, each of them must be put in a separate testing group otherwise they will be flaky.
-
由 Ning Yu 提交于
Some tests use fault injection for testing, they could be flaky if they are put into parallel testing groups as the injected faults can be triggered by the tests in the same testing groups. To make the testing results deterministic each of these tests must be put in a separate group. Such kind of flaky tests are very confusing as a test fails randomly due to fault injected in other tests, so it can be hard to debug. So we add a checking at beginning of ICW and isolation2 tests, it raises and error when fault injectors are put in parallel testing groups.
-
- 17 9月, 2019 1 次提交
-
-
由 Weinan WANG 提交于
* Single set command failed, rollback guc value In gpdb, guc set flow is that: 1. QD set guc 2. QD dispatch job to all QE 3. QE set guc For single set command, in gpdb, it is not 2pc safe. If the set command failed in QE, only QD guc value can rollback. For the guc which has GUC_GPDB_NEED_SYNC flag, it requires guc value is same in the whole session-level. To deal with it, record rollback guc in AbortTransaction to a restore guc list. Re-set these guc when next query coming. However, if it failed again, destroy all QE, since we can not have the same value in that session. Hopefully, guc value can synchronize success in later creategang stage using '-c' command.
-
- 13 9月, 2019 1 次提交
-
-
由 Ashwin Agrawal 提交于
Old wait_for_trigger_fault() in setup.sql is no more needed. gp_wait_until_triggered_fault() now provides the same functionality in much better shape. Hence, deleting wait_for_trigger_fault(). Replacing only existing usage of wait_for_trigger_fault() with gp_wait_until_triggered_fault().
-
- 12 9月, 2019 8 次提交
-
-
由 Ashwin Agrawal 提交于
The "0U: End" should only be executed after vacuum has reached waiting for lock. If executed before then vacuum will not wait for the lock and invalidates the test, means makes it flaky as will fail with following diff --- \/tmp\/build\/e18b2f02\/gpdb_src\/src\/test\/isolation2\/expected\/vacuum_drop_phase_ao\.out 2019-09-05 00:14:41.580197372 +0000 +++ \/tmp\/build\/e18b2f02\/gpdb_src\/src\/test\/isolation2\/results\/vacuum_drop_phase_ao\.out 2019-09-05 00:14:41.580197372 +0000 @@ -32,11 +32,12 @@ DELETE 4 -- We should see that VACUUM blocks while the QE holds the access shared lock 1&: VACUUM ao_test_drop_phase; <waiting ...> +FAILED: Forked command is not blocking; got output: VACUUM 0U: END; END 1<: <... completed> -VACUUM +FAILED: Execution failed This happens because "&:" in isolation2 framework instructs that step to be run in background, means the next step in separate session will get executed in parallel with this step. To resolve the situation adding helper wait_until_waiting_for_required_lock() to check if query has reached the intended blocking point, which is waiting for lock to be granted. Only after this state is reached, we wish to execute next command to unblock the same. Currently, the isolation2 framework lacks to provide direct support for such blocking behavior. Hence, for short term I feel adding the common helper function is good step. In long term we can similar to isolation framework add something simple and generic for this. Though I like the explicit checking for exact lock type and relation etc.. the current helpful function provides.
-
由 Jinbao Chen 提交于
There are some problems on the old multi-stage agg cost model: 1. We use global group number and global workmemmory to estimate the number of spilled tuple. But the situation between the first stage agg and the second stage agg is completely different. Tuples are randomly distributed on the group key on first stage agg. The number of groups on each segment is almost equal to the number of global groups. But in second stage agg, Distribution key is a subset of group key, so the number of groups on each segment is equal to (number of global groups / segment number). So the lld code can cause huge cost deviation. 2. Using ((group number + input rows) / 2) as spilled tupule is 3. Using global group number * 1.3 as the output rows of streaming agg node is very wrong. The out put row of streaming agg node should be group number * segment number * param. too rough. 4. We use numGroups to estimate the initial size of the hash table in exec node. But numGroups is global group number. So we made the following changes: 1. Use a funtion 'groupNumberPerSegemnt' to estimate the group number per segment on first stage agg. Use numGroups/segment number as the group number per segment on second stage agg. 2. Use funtion 'spilledGroupNumber' to estimate spilled tuple number. 3. Use spilled tuple number * segment number as output tuple number of streaming agg node. 4. Use numGroups as group number per segment. Also, we have information on the number of tuples in top N groups. So we can predict the maximum number of tuples in the biggest segment when the skew occurs. When we can predict skew, enable the 1 phase agg. Co-authored-by: NZhenghua Lyu <kainwen@gmail.com>
-
由 Ning Yu 提交于
The partition deadlock tests use hard coded distribution values to form waiting relations on different segments, we can't easily tell whether two rows are on the same segment or not. Even worse, hard coded values are only correct when the cluster has default size (count of primaries) and uses default hash & reduce methods. In GDD tests we should use a helper function, segid(segid, nth), it returns the nth value on segment segid. It's easier to design and understand the tests with it. Also put more rows to the testing tables, so segid() could always return a valid row.
-
由 Ning Yu 提交于
To trigger a deadlock we need to construct several waiting relations, once the last waiting relation is formed the deadlock is detectable by the deadlock detector. In update-deadlock-root-leaf-concurrent-op and delete-deadlock-root-leaf-concurrent-op we used to use `2&:` for the last waiting relation, the isolation2 framework will check that the query blocks, that is, it does not return a result in 0.5 seconds. However it's possible that the deadlock detector is triggered just within that 0.5 seconds, so the isolation2 framework will report a failure which makes the tests flaky. To make these tests deterministic we should use `2>:` for the last waiting query, it puts the query background without checking.
-
由 Ashwin Agrawal 提交于
This reverts commit 265bc393. Need to push the version of commit which has movement of the function to setup and avoid including the server_helpers.sql in these two tests. Will push fresh commit with that change.
-
由 Ashwin Agrawal 提交于
-
由 Ashwin Agrawal 提交于
-
由 Ashwin Agrawal 提交于
Add helper wait_until_waiting_for_required_lock() to check if query has reached the intended blocking point, which is waiting for lock to be granted. Only after this state is reached, we wish to execute next command to unblock the same. Currently, the isolation2 framework lacks to provide direct support for such blocking behavior. Hence, for short term I feel adding the common helper function is good step. In long term we can similar to isolation framework add something simple and generic for this. Though I like the explicit checking for exact lock type and relation etc.. the current helpful function provides.
-
- 11 9月, 2019 2 次提交
-
-
由 Hubert Zhang 提交于
In resgroup mode, bypass queries such as SET commands use RESGROUP_BYPASS_MODE_MEMORY_LIMIT_ON_QD and RESGROUP_BYPASS_MODE_MEMORY_LIMIT_ON_QE to limit the memory usage. But these two value are fixed, while the memory usage of bypass queries could accumulated in a session. It results in bypass memory limit reached after many bypass queries allocated memories whose lifecylce is session level. We introduce bypassMemoryLimitBase to make the bypass memory limit as qeury level instead of session level
-
由 Zhenghua Lyu 提交于
Exextend queries are taken as cursors in Greenplum and dispatched to reader gangs. Under such condition, we do not try to optimize the select statement with locking clause, which means we will lock the whole table in Exclusive mode and do not emit lockrows plannode. For more details, please refer the mailing list: https://groups.google.com/a/greenplum.org/forum/#!msg/gpdb-dev/ugsZca1qLXU/CtUmzEa7CAAJCo-authored-by: NJinbao Chen <jinchen@pivotal.io>
-
- 10 9月, 2019 1 次提交
-
-
由 Ashwin Agrawal 提交于
If "./pg_regress --init-file init_file uao_ddl/compresstype_row" is executed, it fails. As extra argument `--ao-dir=uao` needs to be specified to pg_regress, to convey, to convert these .source files to row and column .sql/.out files and not use regular standard logic on them. This approach has following shortfalls: - additional developer overhead to remember to add that option - without that option .sql/.out files are still generated but are not usable / incorrect - row and column test files directories will always need same prefix - the check for ao-dir prefix was checked for each and every file in input/output directories, which is unnecessary To improve the situation, modifying the logic in pg_regress. No more need extra argument to pg_regress, instead, presence of special file named "GENERATE_ROW_AND_COLUMN_FILES" conveys to pg_regress to apply special conversion rule on that directory. This way always correct conversion rule is applied and developer don't need to specify any extra option.
-
- 06 9月, 2019 1 次提交
-
-
由 Ashwin Agrawal 提交于
In GPDB, gpstringsubs.pl is used to replace tokens in .source test files when creating .sql/.out files for pg_regress. pg_regress itself also performs the substitutions. The order is pg_regress performs substitutions and after that remaining ones are performed by gpstringsubs.pl if required. The logic to check if there are still tokens remaining to be replaced after pg_regress has performed replacement was based on checking if '@' character is present in file or not. If '@' is present in file then gpstringsubs.pl is invoked for that file. gpstringsubs.pl is super slow and takes long time. In many .source files keywords exist like `@Description` or `@db_name` (specially for UAO related tests), which don't need any substitution. But due to presence of these '@' charater keywords, gpstringsubs.pl was invoked for such files unnecessarily, causing the slow down. So, modifying the check to invoke gpstringsubs.pl only on presence of "@gp" keyword instead. So, now any token replacements handled by gpstringsubs.pl should start with "@gp" in the name. Hence, renamed `@syslocale@` to "@gp_syslocale@". Also, moved the logic to perform replacement for `@hostname@` and `@gpcurusername@` to pg_regress itself, as it's much faster and easier to do the same in C. Given the whole testsuite takes very long time in GPD. Need to iterate on single test routinely. These substitutions are invoked on every run of pg_regress (mainly isolation2 also), hence helps to reduce the iteration time for running single test in test suite during development, along with helping cut down pg_regress file conversion time in CI. For regress directory cuts down time from 8 secs and isolation2 from 25 secs to few msecs. In long run we should complete move all the logic in gpstringsubs.pl to pg_regress and kill gpstringsubs.pl.
-
- 05 9月, 2019 4 次提交
-
-
由 Paul Guo 提交于
ci pipeline uses two segment nodes for resgroup testing which is different than ususal three segment node test configuration. Modifying the test to make it independent of segment numbers.
-
由 Paul Guo 提交于
Co-authored-by: NPaul Guo <pguo@pivotal.io> Co-authored-by: NNing Yu <nyu@pivotal.io>
-
由 Paul Guo 提交于
Fix potential shared memory corruption in resource group slot management when query involves entrydb. We've observed that when we terminate a query that involves entrydb, if QD detaches from a resource group before the entrydb backend does that in UnassignResGroup(), data corruption on shared slot pool could happen. In a cassert enabled gpdb version, you could see below FATAL message as below when terminating the query using pg_terminate_backend(). FATAL: Unexpected internal error (resgroup.c:1545) DETAIL: FailedAssertion("!(slot->nProcs == 0)", File: "resgroup.c", Line: 1545) HINT: Process 9903 will wait for gp_debug_linger=120 seconds before termination. Note that its locks and other resources will not be released until then. While on product release that without cassert enabled some subsequent queries could lead to panic. We fix this by let the final process (either QD or entrydb process) do clean up when the nProcs reference number is zero. This is more robust and is less bug prone cross future code change and upstream merge. This patch refactors the related resource code a bit and also moves the fault inject related negative tests ahead so that we could capture some potential errors which happen later (typically the case in this patch). Besides, it fixes another panic caused by potential MySessionState NULL resetting (see code comment for details). That was revealed when running tests in my dev machine so it's a potential flaky testing part. Co-authored-by: NPaul Guo <pguo@pivotal.io> Co-authored-by: NNing Yu <nyu@pivotal.io> Co-authored-by: NGang Xiong <gxiong@pivotal.io>
-
由 Soumyadeep Chakraborty 提交于
This test was added to 5X_STABLE as a part of aee8cac8.
-
- 30 8月, 2019 4 次提交
-
-
由 Daniel Gustafsson 提交于
This extends our coverage by defining a build matrix in the Travis CI pipeline rather than just having a single job. The idea is to build PRs against combinations of common platforms and compilers in order to catch compiler errors and warnings early. The gist of the changes to the previous CI configuration is: - static configuration is replaced by per build config in a matrix - ccache is removed to avoid caching issues - macOS is again built on - the version printing is more selective to only print versions for binaries built in every job - notifications are removed to minimize spamming - builds with various un-orthodox flags are supported - build silently to make output more manageable - parts of the regression tests are executed via new installcheck target in src/test/regress Future tasks is to extend this to more combinations and also to build clientside tooling on Windows as well as extended the test coverage to include more parts of ICW. Reviewed-by: NAsim R P <apraveen@pivotal.io>
-
由 Jesse Zhang 提交于
This must have been a typo committed in commit 6b79a578 . The original commit missed a trailing space in the output file after a newly added comment (which causes a regress diff when optimizer=off) I was going to fix it by adding the trailing space back, but on a second thought, I decided to gut the comment from all 3 files (one .sql and two .out files) to prevent future self-loathing. While we were at it, also remove the ever-expanding list of DROP statements that are ignored from the end of gporca.out. Honestly the `DROP` itself kinda offends me, but that's for another day. Let's get the build green for now.
-
由 Paul Guo 提交于
It should proceed only after the 'forked' connection is really blocking at the lock. & means blocking but the test framework has no way to really know the sql is blocking at lock waiting so test case should take the responsibility. Reviewed-by: NAsim R P <apraveen@pivotal.io>
-
由 Bhuvnesh Chaudhary 提交于
While processing constraint interval, also consider predicate of type <cast(ident)> array cmp else we lose the opportunity to generate implied quals. This is corresponding to the ORCA changes.
-
- 29 8月, 2019 4 次提交
-
-
由 Ashwin Agrawal 提交于
- arrange them alphabetically - separate upsteam from GPDB specific ones - add few missing ones - also reflect to ignore only in current directory or sub-directories
-
由 Chris Hajas 提交于
Consider the query below: ``` test=# explain select * from foo, jazz where foo.c = jazz.e and jazz.f = 10 and a in (select b+1 from bar); QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice4; segments: 3) (cost=0.00..1324469.30 rows=1 width=20) -> Hash Join (cost=0.00..1324469.30 rows=1 width=20) Hash Cond: foo.c = jazz.e -> Dynamic Table Scan on foo (dynamic scan id: 1) (cost=0.00..1324038.29 rows=34 width=12) Filter: a = (b + 1) AND ((subplan)) SubPlan 1 -> Materialize (cost=0.00..431.00 rows=1 width=4) -> Broadcast Motion 1:3 (slice2) (cost=0.00..431.00 rows=3 width=4) -> Limit (cost=0.00..431.00 rows=1 width=4) -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=4) -> Limit (cost=0.00..431.00 rows=1 width=4) -> Table Scan on bar (cost=0.00..431.00 rows=34 width=4) -> Hash (cost=100.00..100.00 rows=34 width=4) -> Partition Selector for foo (dynamic scan id: 1) (cost=10.00..100.00 rows=34 width=4) -> Broadcast Motion 3:3 (slice3; segments: 3) (cost=0.00..431.00 rows=1 width=8) -> Table Scan on jazz (cost=0.00..431.00 rows=1 width=8) Filter: f = 10 Optimizer status: PQO version 3.65.0 (18 rows) ``` Previously, since the subplan was in a qual, we did not populate the qual properly when executing a dynamic table scan node. Thus the subplan attribute in PlanState of the dynamic table scan was incorrectly set to NULL, causing a later crash. We now populate this similarly to how we do it for dynamic index/bitmap scans. Co-authored-by: NSambitesh Dash <sdash@pivotal.io> Co-authored-by: NChris Hajas <chajas@pivotal.io> Co-authored-by: NAshuka Xue <axue@pivotal.io>
-
由 Ashwin Agrawal 提交于
To enable or disable the GUC gp_enable_global_deadlock_detector, restart is required. But this GUC is only used on master, so just restart master instead of full cluster. This helps to cut-down the test time by a min. Also, in process remove the pg_sleep(2) calls, as GUCs gp_enable_global_deadlock_detector and gp_global_deadlock_detector_period can be set sametime and hence don't need separate time to reload the config and waste time. Also, removing prepare-for-local as only one test exists for local locks which is local-deadlock-03, hence directly prepare for the same inside that sql file.
-
由 Ashwin Agrawal 提交于
prepare-for-local tests resets the gp_global_deadlock_detector_period guc to default 2mins time. Due to it any GDD test after that will take atleast 2mins. Instead flip the order. Ideally prepare-for-local test don't need to reset the guc. Instead can use fault injector in local-deadlock-03 to avoid GDD from kicking-in, but that's for separate commit. This shaves off 6mins of isolation2 test time on my laptop.
-