- 13 1月, 2018 40 次提交
-
-
由 Heikki Linnakangas 提交于
The test was sensitive to the number of pages in the pg_rewrite system table's index, for no good reason. Also, don't create a new database for it, to speed it up.
-
由 Heikki Linnakangas 提交于
It was only used by the filerep code. Now that that's gone, this was just dead code.
-
由 Jacob Champion 提交于
The DB_IN_STANDBY_NEW_TLI_SET state doesn't really seem to do anything anymore, as of commit 813b817cc. Remove it entirely to get rid of an assertion during standby tests. Also remove multipass function declarations; they're gone.
-
由 Ashwin Agrawal 提交于
This test passed locally, also in PR pipeline and forked pipeline multiple times, but intermittently faling in main CI pipeline hence disabling the same. The failure is happening when master is connecting to segments first time after PANIC and its failing with ``` +LOG: could not connect to segment: initialization of segworker group failed (cdbgang.c:235) +LOG: could not connect to segment: initialization of segworker group failed (cdbgang.c:235) 2018-01-02 02:44:16.225927 UTC,"gpadmin","isolation2test",p33565,th-1615808736,"[local]",,2018-01-02 02:44:16 UTC,0,con640,,seg-1,,,,sx1,"FATAL","XX000","DTM initialization: failure during startup recovery, retry failed, check segment status (cdbtm.c:1537)",,"Process 33565 will wait for gp_debug_linger=120 seconds before termination. Note that its locks and other resources will not be released until then.",,,,,0,,"cdbtm.c",1537,"Stack trace: 1 0x9c1afb postgres errstart + 0x1db 2 0x9c3ca9 postgres elog_finish + 0xb9 3 0xadedea postgres initTM (cdbtm.c:1536) 4 0x9dac77 postgres InitPostgres + 0x857 5 0x8867a7 postgres PostgresMain + 0x207 6 0x81c97d postgres <symbol not found> (postmaster.c:0) 7 0x81eea2 postgres PostmasterMain + 0xc42 8 0x73f0a1 postgres main (main.c:206) 9 0x7f909b4fbd1d libc.so.6 __libc_start_main + 0xfd 10 0x4bf2b5 postgres <symbol not found> + 0x4bf2b5 ``` Investigating and will renable this newly added test once able to find why first connection is failing from master.
-
由 Jacob Champion 提交于
Unfortunately the cluster crashes anyway two tests later. Rather than comment out half the tests to get a fake green, put this set of tests back. We'll just have to solve this one problem at a time. This reverts commit 5982a72614492916187ca27fc660d7cc7e3b69e1.
-
由 Jacob Champion 提交于
The promotion logic that gpactivatestandby relies on doesn't work yet, and when these tests fail, they leave the cluster completely unusable.
-
由 Ashwin Agrawal 提交于
This test in TINC is very shaky, as brings down primary and mirror and hence affects gp_segment_configuration. Test intends to fail broadcasting of COMMIT PREPARED to one segment and hence trigger PANIC in master while after completing phase 2 of 2PC. Master's recovery cycle should correctly broadcast COMMIT PREPARED again because master should find distributed commit record in its xlog during recovery. Verify that the transaction is committed after recovery. This scenario used to create cluster inconsistency due to bug fixed now, as transaction used to get committed on all segments except one where COMMIT PREPARED broadcast failed before recovery. Master used to miss sending the COMMIT PREPARED across restart and instead abort the transaction after querying in-doubt prepared transactions from segments.
-
由 Ashwin Agrawal 提交于
To support writing tests where session can cause PANIC of master, add retry logic while establishing connection in isolation2. This helps to keep the tests simple.
-
由 Ashwin Agrawal 提交于
-
由 Ashwin Agrawal 提交于
If fts detects primary as down, it retries n times before marking it down. But mirror gets marked as down if connection to primary has not been made or lost. This surfaced as problem mostly during cluster start (gpstart), where sequence is to start primary and mirror followed by master. In many instances when master probed primary, mirror connection was yet to be made and hence up mirror in configuration unnecessarily got marked down, if if just few secs latr mirror established connection to primary. So, to avoid such sitations plus make it little resilient against minor network glitches, adding variable to record when initialization or disconnection happened. Using the same on fts probe find now can find how long mirror didn't showed-up. Only if mirror didn't show-up for allowed period (30 secs) for now report it was down, else request fts to retry the probe. This logic doesn't affect regular flow also avoids any waiting in utilties for specific states after cluster restart.
-
由 Ashwin Agrawal 提交于
If postmaster.pid file is present, reload will get error as "No such process". But if postmaster.pid is not present then error returned back is "pg_ctl: PID file "......../postmaster.pid" does not exist". So, its better not to check for any particular error message but report segmnes failed to be reloaded.
-
由 Ashwin Agrawal 提交于
GPDB skips databases that cannot be connected to in computing the oldest database in vac_truncate_clog(). Make write_database_file() same, which was reverted to upstream version. This helps to get the storage tests green for now. Later can figure out and uniformly remove this code from vac_truncate_clog() and write_database_file() if better solution is found to original issue for which this check was added.
-
由 Ashwin Agrawal 提交于
-
由 Ashwin Agrawal 提交于
With walrep we have new states 'n' not in sync. So, adding valid states corresponding to it to let some tests pass. Lot more cleanup needs to happen of this area to remove filerep specific states but that's work for different commit.
-
由 Ashwin Agrawal 提交于
Since now gp_relation_node table no more exists, no point testing if ERROR is reported if pg_aoseg and gp_relation_node are not in sync.
-
由 Ashwin Agrawal 提交于
The test is specific to filerep behavior whene truncate was not properly resynced, causing the problem.
-
由 Ashwin Agrawal 提交于
-
由 Asim R P 提交于
FTS scan marks the stopped mirror as down so that subsequent recoverfull works.
-
由 Ashwin Agrawal 提交于
This removes the make target storage_filerep.
-
由 Ashwin Agrawal 提交于
This test is not more relevant with wal replication. This should get `aocoalter_catalog_loaders` task in Storage schedule green.
-
由 Ashwin Agrawal 提交于
Now that gpstop/gpstart works for wal replication, remove segspace from --exclude-tests. filespace is only one remains in --exclude-tests list which would go away soon as well.
-
由 Jimmy Yih 提交于
All that was needed was to make sure mirrors are not started with pg_ctl -w flag since the mirror is in recovery mode and will not respond to PQPing messages. Author: Jimmy Yih <jyih@pivotal.io> Author: Marbin Tan <mtan@pivotal.io>
-
由 Jimmy Yih 提交于
With file replication gone, gpinitsystem should no longer try to initialize the cluster through filerep sequence. The sequence now goes as follows: 1. Create and start master in master-only mode 2. Create primaries and register to master 3. Stop master. 4. Run gpstart to start master and primaries. 5. Create mirrors w/ pg_basebackup and register to master. 6. Start the mirrors and wait until primaries and mirrors sync. Author: Jimmy Yih <jyih@pivotal.io> Author: Marbin Tan <mtan@pivotal.io>
-
由 Heikki Linnakangas 提交于
pg_partitions contains calls to pg_get_expr() function. That function suffers from a race condition: If the relation is dropped between the get_rel_name() call, and another syscache lookup in pg_get_expr_worker(), you get a "relation not found" error. The error message is reasonable, and I don't see any easy fix for the pg_partitions view itself, so just try to avoid hitting that in the tests. For some reason we are hitting that frequently in this particular query. Change it to query pg_class instead, it doesn't use any of the more complicated fields from pg_partitions, anyway. I'm pushing this to the 'walreplication' branch first, because for some reason, we're seeing the failure there more often than on 'master'. If this fixes the problem, I'll push this to 'master', too.
-
由 Heikki Linnakangas 提交于
-
由 Max Yang 提交于
Author: Max Yang <myang@pivotal.io> Author: Xiaoran Wang <xiwang@pivotal.io>
-
由 Max Yang 提交于
Author: Max Yang <myang@pivotal.io> Author: Xiaoran Wang <xiwang@pivotal.io>
-
由 Max Yang 提交于
Currently we start standby master when WITH_MIRROS=true. Which will make fake wal receiver error out: number of requested standby connections exceeds max_wal_senders (currently 1) Because standby master already use one wal_sender. To make test pass, we remove standby master at the beginning of this test and recover it at the end of test. A better solution maybe change this value to be configurable at startup time. But this is just a simple fix for passing. Author: Max Yang <myang@pivotal.io> Author: Xiaoran Wang <xiwang@pivotal.io>
-
由 Max Yang 提交于
Since we start standby master if WITH_MIRRORS=true. The element number in gp_segment_configuration changes, and result in change of answer file Author: Max Yang <myang@pivotal.io> Author: Xiaoran Wang <xiwang@pivotal.io>
-
由 Max Yang 提交于
Author: Max Yang <myang@pivotal.io> Author: Xiaoran Wang <xiwang@pivotal.io>
-
由 Asim R P 提交于
The last commit removed the replication ports (replacing them with -1 in the Python utilities), and those numbers were being checked as part of this test. Comment the checks out and tag with FIXMEs. Author: Asim R P <apraveen@pivotal.io> Author: Jacob Champion <pchampion@pivotal.io>
-
由 Heikki Linnakangas 提交于
At least on with gpdemo, on my laptop. We really shouldn't need these filerep port numbers anymore, right?
-
由 Heikki Linnakangas 提交于
And the mechanism in initdb and gpinitsystem to set it. It's no longer used for anything.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
These were left over when Persistent Tables and Filerep were removed.
-
由 Heikki Linnakangas 提交于
What was left of it, was a very thin and leaky abstraction, plus WAL-logging functions. Move the WAL-logging functions to a new file called cdbappendonlyxlog.c, and dismantle the MirroredAppendOnlyOpen abstraction.
-
由 Heikki Linnakangas 提交于
Instead of waiting for the primary and mirror to have the exact same LSN, add logic to retry the file comparisons a few times if there are any differences. This is a natural continuation of the earlier retry-loops I added there, but now the LSN checks are made so that we don't even expect the primary and mirror to sync on a particular value, and we retry not while trying to sync the LSNs, but during the comparison itself. This makes it possible to run gp_replica_check on a running cluster, while modifying tables. (The extra checkpoints it emits will have a performance impact on the other queries, though)I tested this by running pgbench at the same time. You'll get a few NOTICEs about mismatches, but those are harmless. After a few automatic retries, it eventually passes.
-
由 Heikki Linnakangas 提交于
Might as well call FileTruncate directly.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
-