- 13 1月, 2018 40 次提交
-
-
由 Max Yang 提交于
Currently we start standby master when WITH_MIRROS=true. Which will make fake wal receiver error out: number of requested standby connections exceeds max_wal_senders (currently 1) Because standby master already use one wal_sender. To make test pass, we remove standby master at the beginning of this test and recover it at the end of test. A better solution maybe change this value to be configurable at startup time. But this is just a simple fix for passing. Author: Max Yang <myang@pivotal.io> Author: Xiaoran Wang <xiwang@pivotal.io>
-
由 Max Yang 提交于
Since we start standby master if WITH_MIRRORS=true. The element number in gp_segment_configuration changes, and result in change of answer file Author: Max Yang <myang@pivotal.io> Author: Xiaoran Wang <xiwang@pivotal.io>
-
由 Max Yang 提交于
Author: Max Yang <myang@pivotal.io> Author: Xiaoran Wang <xiwang@pivotal.io>
-
由 Asim R P 提交于
The last commit removed the replication ports (replacing them with -1 in the Python utilities), and those numbers were being checked as part of this test. Comment the checks out and tag with FIXMEs. Author: Asim R P <apraveen@pivotal.io> Author: Jacob Champion <pchampion@pivotal.io>
-
由 Heikki Linnakangas 提交于
At least on with gpdemo, on my laptop. We really shouldn't need these filerep port numbers anymore, right?
-
由 Heikki Linnakangas 提交于
And the mechanism in initdb and gpinitsystem to set it. It's no longer used for anything.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
These were left over when Persistent Tables and Filerep were removed.
-
由 Heikki Linnakangas 提交于
What was left of it, was a very thin and leaky abstraction, plus WAL-logging functions. Move the WAL-logging functions to a new file called cdbappendonlyxlog.c, and dismantle the MirroredAppendOnlyOpen abstraction.
-
由 Heikki Linnakangas 提交于
Instead of waiting for the primary and mirror to have the exact same LSN, add logic to retry the file comparisons a few times if there are any differences. This is a natural continuation of the earlier retry-loops I added there, but now the LSN checks are made so that we don't even expect the primary and mirror to sync on a particular value, and we retry not while trying to sync the LSNs, but during the comparison itself. This makes it possible to run gp_replica_check on a running cluster, while modifying tables. (The extra checkpoints it emits will have a performance impact on the other queries, though)I tested this by running pgbench at the same time. You'll get a few NOTICEs about mismatches, but those are harmless. After a few automatic retries, it eventually passes.
-
由 Heikki Linnakangas 提交于
Might as well call FileTruncate directly.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
Revert the code to open/read/write regular files, to the way it's in the upstream.
-
由 Heikki Linnakangas 提交于
It's now unused.
-
由 Heikki Linnakangas 提交于
WAL could be created e.g. by checkpoints, or some background activity that sets hint bits. Such activity might cause a failure, if a data file is modified in the master, but the change has not been replayed in the standby yet. But just because it can make our check to fail, doesn't mean we need to treat it as an automatic failure. Keep the warning, but consider the test as a success, if the check itself found nothing wrong.
-
由 Heikki Linnakangas 提交于
I removed the CHECKPOINT calls from the python script yesterday, replacing them with RequestCheckpoint() in the UDF itself. But I didn't use the CHECKPOINT_WAIT flag, so it might go ahead with the checking before the checkpoint has run. That might explain the gp_replica_check failures we're seeing in the pipeline now.
-
由 Heikki Linnakangas 提交于
Move the checkpoint-retry logic to within get_synced_lsns(), so that it applies to the synced LSN we get after running all the checks, too. When I added the retry logic to the get_synced_lsns() call before the checks, I didn't realize that there's a second call after the checks. This hopefully fixes the "WARNING: unable to obtain end synced LSN values between primary and mirror" messages we're still occasionally seeing in the pipeline.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
I removed the autoconf flag and #ifdefs earlier, but missed these.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
WAL replication is the name of the game on this branch.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
And clean up some comments that talked about persistent tables.
-
由 Heikki Linnakangas 提交于
They were not kept up-to-date anymore anyway. Remove the actual tables. There are still a few references to these tables in the management tools. AFAICS they're in tests, and I was hesitent to remove them just yet, in case we're going to use the existing tests as a guide when writing new tests.
-
由 Heikki Linnakangas 提交于
gp_replica_check would often get stuck, waiting for the standby to apply all the WAL it was sent. However, there is nothing to force a WAL flush in the master. Usually, the last record in a transaction is a transaction commit, which is flushed, and many other things cause a WAL flush too, but when running the regression suite, often the last WAL record is a WAL-logged hint bit update, just after a checkpoint. To work around that, if the standby doesn't catch up in 20 seconds, issue a CHECKPOINT in the master, to force a WAL flush. Something more lightweight could be used to flush the WAL, but gp_replica_check needs the data on disk to be up to date, so a checkpoint seems like a good idea. In fact, perhaps we should always issue a CHECKPOINT, even before the first attempt. Currently the python script does that, but now it seems redundant..
-
由 Heikki Linnakangas 提交于
An AOCO table doesn't have a '0' segfile at all. Therefore, using smgrexists() to check if a relation exists on disk does not work.
-
由 Taylor Vesely 提交于
Now that we are removing the persistent tables, these tests no longer make sense. Author: Taylor Vesely <tvesely@pivotal.io> Author: Ashwin Agrawal <aagrawal@pivotal.io>
-
由 Ashwin Agrawal 提交于
Since AO/CO file creation generates xlog record, update answer file.
-
由 Heikki Linnakangas 提交于
This hopefully fixes the gp_replica_check failures we're seeing in the pipeline.
-
由 Heikki Linnakangas 提交于
An empty segfile is mostly treated the same as a missing segfile, but for the sake of gp_replica_check, WAL-log the creation of an empty segfile anyway, so that there is no inconsistency between master and mirror, such that an empty segfile exists on master, but it's missing entirely in the mirror. (I'm not entirely sure if there is non-testing code that requires that, too, so better safe than sorry). This should fix the warnings like this: WARNING: Unable to open file /tmp/build/e18b2f02/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/base/16384/61117.1152 from gp_replica_check. (There are other failures still.)
-
由 Heikki Linnakangas 提交于
If there is some unused space at the end of a WAL page, because we never split WAL record header, the WAL receiver's flush and apply positions were reported a bit funnily. The flush position would report the end of the page, including the unused padding, while the apply position would only go up to the end of last WAL record on the page, excluding the padding page. If you compare flush == apply positions, it would look as if not all of the WAL had been applied yet, even though the difference between the pointers was just the unused padding space. This will get fixed in PostgreSQL 9.3, where the padding at end of WAL page is eliminated, but until then, tweak the reporting of the apply position to also include any end-of-page padding. That makes the flush == apply comparison a valid way to check if all the flushed WAL has been applied, even at page boundaries. I believe this explains the "unable to obtain start synced LSN values between primary and mirror" failures we've been seeing from the gp_replica_check test. gp_replica_check waits for apply == flush, and if the last WAL record lands at a page boundary, that condition never became true because of the padding. (Although I'm not sure why it used to work earlier, or did it?)
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
Because persistent tables are no more. NOTE: It would still be nice to check for consistency between pg_class and files on disk, to check that there are no extra data files, and no data files missing that have a pg_class entry. Same with AO seg files, I suppose. But that's a significantly different query than what we have here.
-
由 Ashwin Agrawal 提交于
Resolving the GPDB_84_MERGE_FIXME now, that we match close to upstream. Without thsi fix the relation files were not dropped during recovery or replay on mirrors.
-
由 Heikki Linnakangas 提交于
Because it's no longer created by the MMXLOG records. Alternatively, we could have a separate WAL record type for the creation. But this will do for now.
-
由 Heikki Linnakangas 提交于
* Need to set relFileNode field correctly in MirroredAppendOnlyOpen, along with the File descriptor itself. Otherwise the relfilenode is set incorrectly in WAL records. * Pretend that filespace location is always "tblspc_dummy_<tablespace oid>". The filespace/tablespace stuff is quite broken ATM, but hopefully this at least avoids some crashing.
-
由 Heikki Linnakangas 提交于
The fault injection points used in the test didn't exist anymore. Add a new injection point in RecordTransactionCommit(), just before writing the commit WAL record, and use that in the test. Remove a bunch of fault injection IDs that are no longer used. (They are still referenced in some TINC tests, but the injection points don't exist anymore, so those tests will need to be rewritten if we want to keep them.)
-