- 13 1月, 2018 40 次提交
-
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
These were left over when Persistent Tables and Filerep were removed.
-
由 Heikki Linnakangas 提交于
What was left of it, was a very thin and leaky abstraction, plus WAL-logging functions. Move the WAL-logging functions to a new file called cdbappendonlyxlog.c, and dismantle the MirroredAppendOnlyOpen abstraction.
-
由 Heikki Linnakangas 提交于
Instead of waiting for the primary and mirror to have the exact same LSN, add logic to retry the file comparisons a few times if there are any differences. This is a natural continuation of the earlier retry-loops I added there, but now the LSN checks are made so that we don't even expect the primary and mirror to sync on a particular value, and we retry not while trying to sync the LSNs, but during the comparison itself. This makes it possible to run gp_replica_check on a running cluster, while modifying tables. (The extra checkpoints it emits will have a performance impact on the other queries, though)I tested this by running pgbench at the same time. You'll get a few NOTICEs about mismatches, but those are harmless. After a few automatic retries, it eventually passes.
-
由 Heikki Linnakangas 提交于
Might as well call FileTruncate directly.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
Revert the code to open/read/write regular files, to the way it's in the upstream.
-
由 Heikki Linnakangas 提交于
It's now unused.
-
由 Heikki Linnakangas 提交于
WAL could be created e.g. by checkpoints, or some background activity that sets hint bits. Such activity might cause a failure, if a data file is modified in the master, but the change has not been replayed in the standby yet. But just because it can make our check to fail, doesn't mean we need to treat it as an automatic failure. Keep the warning, but consider the test as a success, if the check itself found nothing wrong.
-
由 Heikki Linnakangas 提交于
I removed the CHECKPOINT calls from the python script yesterday, replacing them with RequestCheckpoint() in the UDF itself. But I didn't use the CHECKPOINT_WAIT flag, so it might go ahead with the checking before the checkpoint has run. That might explain the gp_replica_check failures we're seeing in the pipeline now.
-
由 Heikki Linnakangas 提交于
Move the checkpoint-retry logic to within get_synced_lsns(), so that it applies to the synced LSN we get after running all the checks, too. When I added the retry logic to the get_synced_lsns() call before the checks, I didn't realize that there's a second call after the checks. This hopefully fixes the "WARNING: unable to obtain end synced LSN values between primary and mirror" messages we're still occasionally seeing in the pipeline.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
I removed the autoconf flag and #ifdefs earlier, but missed these.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
WAL replication is the name of the game on this branch.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
And clean up some comments that talked about persistent tables.
-
由 Heikki Linnakangas 提交于
They were not kept up-to-date anymore anyway. Remove the actual tables. There are still a few references to these tables in the management tools. AFAICS they're in tests, and I was hesitent to remove them just yet, in case we're going to use the existing tests as a guide when writing new tests.
-
由 Heikki Linnakangas 提交于
gp_replica_check would often get stuck, waiting for the standby to apply all the WAL it was sent. However, there is nothing to force a WAL flush in the master. Usually, the last record in a transaction is a transaction commit, which is flushed, and many other things cause a WAL flush too, but when running the regression suite, often the last WAL record is a WAL-logged hint bit update, just after a checkpoint. To work around that, if the standby doesn't catch up in 20 seconds, issue a CHECKPOINT in the master, to force a WAL flush. Something more lightweight could be used to flush the WAL, but gp_replica_check needs the data on disk to be up to date, so a checkpoint seems like a good idea. In fact, perhaps we should always issue a CHECKPOINT, even before the first attempt. Currently the python script does that, but now it seems redundant..
-
由 Heikki Linnakangas 提交于
An AOCO table doesn't have a '0' segfile at all. Therefore, using smgrexists() to check if a relation exists on disk does not work.
-
由 Taylor Vesely 提交于
Now that we are removing the persistent tables, these tests no longer make sense. Author: Taylor Vesely <tvesely@pivotal.io> Author: Ashwin Agrawal <aagrawal@pivotal.io>
-
由 Ashwin Agrawal 提交于
Since AO/CO file creation generates xlog record, update answer file.
-
由 Heikki Linnakangas 提交于
This hopefully fixes the gp_replica_check failures we're seeing in the pipeline.
-
由 Heikki Linnakangas 提交于
An empty segfile is mostly treated the same as a missing segfile, but for the sake of gp_replica_check, WAL-log the creation of an empty segfile anyway, so that there is no inconsistency between master and mirror, such that an empty segfile exists on master, but it's missing entirely in the mirror. (I'm not entirely sure if there is non-testing code that requires that, too, so better safe than sorry). This should fix the warnings like this: WARNING: Unable to open file /tmp/build/e18b2f02/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/base/16384/61117.1152 from gp_replica_check. (There are other failures still.)
-
由 Heikki Linnakangas 提交于
If there is some unused space at the end of a WAL page, because we never split WAL record header, the WAL receiver's flush and apply positions were reported a bit funnily. The flush position would report the end of the page, including the unused padding, while the apply position would only go up to the end of last WAL record on the page, excluding the padding page. If you compare flush == apply positions, it would look as if not all of the WAL had been applied yet, even though the difference between the pointers was just the unused padding space. This will get fixed in PostgreSQL 9.3, where the padding at end of WAL page is eliminated, but until then, tweak the reporting of the apply position to also include any end-of-page padding. That makes the flush == apply comparison a valid way to check if all the flushed WAL has been applied, even at page boundaries. I believe this explains the "unable to obtain start synced LSN values between primary and mirror" failures we've been seeing from the gp_replica_check test. gp_replica_check waits for apply == flush, and if the last WAL record lands at a page boundary, that condition never became true because of the padding. (Although I'm not sure why it used to work earlier, or did it?)
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
Because persistent tables are no more. NOTE: It would still be nice to check for consistency between pg_class and files on disk, to check that there are no extra data files, and no data files missing that have a pg_class entry. Same with AO seg files, I suppose. But that's a significantly different query than what we have here.
-
由 Ashwin Agrawal 提交于
Resolving the GPDB_84_MERGE_FIXME now, that we match close to upstream. Without thsi fix the relation files were not dropped during recovery or replay on mirrors.
-
由 Heikki Linnakangas 提交于
Because it's no longer created by the MMXLOG records. Alternatively, we could have a separate WAL record type for the creation. But this will do for now.
-
由 Heikki Linnakangas 提交于
* Need to set relFileNode field correctly in MirroredAppendOnlyOpen, along with the File descriptor itself. Otherwise the relfilenode is set incorrectly in WAL records. * Pretend that filespace location is always "tblspc_dummy_<tablespace oid>". The filespace/tablespace stuff is quite broken ATM, but hopefully this at least avoids some crashing.
-
由 Heikki Linnakangas 提交于
The fault injection points used in the test didn't exist anymore. Add a new injection point in RecordTransactionCommit(), just before writing the commit WAL record, and use that in the test. Remove a bunch of fault injection IDs that are no longer used. (They are still referenced in some TINC tests, but the injection points don't exist anymore, so those tests will need to be rewritten if we want to keep them.)
-
由 Heikki Linnakangas 提交于
It was done in the later startup passes, which were removed. Add the call to where it is in the upstream. Fix compiler warning about unused variable in the passing.
-
由 Heikki Linnakangas 提交于
-
由 Heikki Linnakangas 提交于
* Revert almost all the changes in smgr.c / md.c, to not go through the Mirrored* APIs. * Remove mmxlog stuff. Use upstream "pending relation deletion" code instead. * Get rid of multiple startup passes. Now it's just a single pass like in the upstream. * Revert the way database drop/create are handled to the way it is in upstream. Doesn't use PT anymore, but accesses file system directly, and WAL-logs a single CREATE/DROP DATABASE WAL record. * Get rid of MirroredLock * Remove a few tests that were specific to persistent tables. * Plus a lot of little removals and reverts to upstream code.
-
由 Ashwin Agrawal 提交于
With file replication code removed, wal replication is only HA system for Greenplum now. Ideally wished to remove the config option as its no more a choice but keeping for now as all code checking for same needs to be modified anyways at some point.
-
由 Ashwin Agrawal 提交于
This just removes already decoupled code for filerep, lot more fts code needs to be cleaned-up for filerep removal.
-
由 Ashwin Agrawal 提交于
-