提交 · a1990a80af606257136c46f99caffcb41a347f6e · Greenplum / Gpdb

13 1月, 2018 40 次提交

Fix walrep test case failure. · a1990a80

由 Max Yang 提交于 12月 21, 2017

Currently we start standby master when WITH_MIRROS=true. Which
will make fake wal receiver error out:
number of requested standby connections exceeds max_wal_senders (currently 1)
Because standby master already use one wal_sender.
To make test pass, we remove standby master at the beginning of this test
and recover it at the end of test.
A better solution maybe change this value to be configurable at startup time.
But this is just a simple fix for passing.

Author: Max Yang <myang@pivotal.io>
Author: Xiaoran Wang <xiwang@pivotal.io>

a1990a80

Fix bgwriter_checkpoint test case · 5aa2f649

由 Max Yang 提交于 12月 21, 2017

Since we start standby master if WITH_MIRRORS=true. The element number
in gp_segment_configuration changes, and result in change of answer file

Author: Max Yang <myang@pivotal.io>
Author: Xiaoran Wang <xiwang@pivotal.io>

5aa2f649

M
Start standby master in create-demo-cluster when WITH_MIRRORS = true. · 0c7f1281
由 Max Yang 提交于 12月 20, 2017
```
Author: Max Yang <myang@pivotal.io>
Author: Xiaoran Wang <xiwang@pivotal.io>
```
0c7f1281

gpaddmirrors: fix unit tests · 5cc2ddd0

由 Asim R P 提交于 12月 20, 2017

The last commit removed the replication ports (replacing them with -1 in
the Python utilities), and those numbers were being checked as part of
this test. Comment the checks out and tag with FIXMEs.

Author: Asim R P <apraveen@pivotal.io>
Author: Jacob Champion <pchampion@pivotal.io>

5cc2ddd0

Quick fix to make gpstart work. · cadf63a8

由 Heikki Linnakangas 提交于 12月 20, 2017

At least on with gpdemo, on my laptop.

We really shouldn't need these filerep port numbers anymore, right?

cadf63a8

H
Remove unused gp_initdb_mirrored variable. · c50fa05d
由 Heikki Linnakangas 提交于 12月 20, 2017
```
And the mechanism in initdb and gpinitsystem to set it. It's no longer
used for anything.
```
c50fa05d
H

Remove leftover LWLocks that are now unused. · 8a46029f
由 Heikki Linnakangas 提交于 12月 20, 2017

8a46029f
H
Remove GUCs and fault injection points related to PT and filerep. · e8dc97d4
由 Heikki Linnakangas 提交于 12月 20, 2017
```
These were left over when Persistent Tables and Filerep were removed.
```
e8dc97d4

Remove cdbmirroredappendonly.[ch]. · 334d41a8

由 Heikki Linnakangas 提交于 12月 19, 2017

What was left of it, was a very thin and leaky abstraction, plus WAL-logging
functions. Move the WAL-logging functions to a new file called
cdbappendonlyxlog.c, and dismantle the MirroredAppendOnlyOpen abstraction.

334d41a8

Add more robust retry logic to gp_replica_check, so that it can be run online. · f6d42b45

由 Heikki Linnakangas 提交于 12月 19, 2017

Instead of waiting for the primary and mirror to have the exact same LSN,
add logic to retry the file comparisons a few times if there are any
differences. This is a natural continuation of the earlier retry-loops I
added there, but now the LSN checks are made so that we don't even expect
the primary and mirror to sync on a particular value, and we retry not
while trying to sync the LSNs, but during the comparison itself.

This makes it possible to run gp_replica_check on a running cluster, while
modifying tables. (The extra checkpoints it emits will have a performance
impact on the other queries, though)I tested this by running pgbench at the
same time. You'll get a few NOTICEs about mismatches, but those are
harmless. After a few automatic retries, it eventually passes.

f6d42b45

H
Remove MirroredAppendOnly_Truncate() function. · f882ee40
由 Heikki Linnakangas 提交于 12月 19, 2017
```
Might as well call FileTruncate directly.
```
f882ee40
H

Remove unused fields, and README. · e95a8ada
由 Heikki Linnakangas 提交于 12月 19, 2017

e95a8ada
H

Remove some remnants of multi-pass recovery from postmaster.c. · 2aa8a67e
由 Heikki Linnakangas 提交于 12月 18, 2017

2aa8a67e
H
Remove mirrored flatfile stuff. · 7a249d7e
由 Heikki Linnakangas 提交于 12月 18, 2017
```
Revert the code to open/read/write regular files, to the way it's in the
upstream.
```
7a249d7e
H
Remove gp_global_sequence tables. · f982fabe
由 Heikki Linnakangas 提交于 12月 18, 2017
```
It's now unused.
```
f982fabe

It's not an automatic fail, if some WAL records were created during test. · 00ea0b87

由 Heikki Linnakangas 提交于 12月 19, 2017

WAL could be created e.g. by checkpoints, or some background activity that
sets hint bits. Such activity might cause a failure, if a data file is
modified in the master, but the change has not been replayed in the standby
yet. But just because it can make our check to fail, doesn't mean we need
to treat it as an automatic failure. Keep the warning, but consider the test
as a success, if the check itself found nothing wrong.

00ea0b87

In gp_replica_check, wait for checkpoint to finish. · 189ca232

由 Heikki Linnakangas 提交于 12月 19, 2017

I removed the CHECKPOINT calls from the python script yesterday, replacing
them with RequestCheckpoint() in the UDF itself. But I didn't use the
CHECKPOINT_WAIT flag, so it might go ahead with the checking before the
checkpoint has run. That might explain the gp_replica_check failures we're
seeing in the pipeline now.

189ca232

Add the same work-around to getting a "synced" LSN after checks, as before. · f21ecf52

由 Heikki Linnakangas 提交于 12月 18, 2017

Move the checkpoint-retry logic to within get_synced_lsns(), so that it
applies to the synced LSN we get after running all the checks, too. When
I added the retry logic to the get_synced_lsns() call before the checks,
I didn't realize that there's a second call after the checks.

This hopefully fixes the "WARNING: unable to obtain end synced LSN values
between primary and mirror" messages we're still occasionally seeing
in the pipeline.

f21ecf52

H

Remove one more check for enable_segwalrep that I missed. · f3f4c8fa
由 Heikki Linnakangas 提交于 12月 18, 2017

f3f4c8fa
H
Also remove references to enable_segwalrep in Makefiles. · 1697a640
由 Heikki Linnakangas 提交于 12月 18, 2017
```
I removed the autoconf flag and #ifdefs earlier, but missed these.
```
1697a640
H

Remove some stray references to MirroredLock · 953a358f
由 Heikki Linnakangas 提交于 12月 18, 2017

953a358f
H
Remove --disable-segwalrep option, and the #ifdefs. · db7e4020
由 Heikki Linnakangas 提交于 12月 18, 2017
```
WAL replication is the name of the game on this branch.
```
db7e4020
H

Fix two more unit tests. · 49e448be
由 Heikki Linnakangas 提交于 12月 18, 2017

49e448be
H

Fix unit test. · db9c4889
由 Heikki Linnakangas 提交于 12月 18, 2017

db9c4889
H

Fix test, now that gp_persistent_relation_node table is no more. · c0dc24b0
由 Heikki Linnakangas 提交于 12月 18, 2017

c0dc24b0
H
Remove some code that was left unused earlier. · fe12fd8c
由 Heikki Linnakangas 提交于 12月 15, 2017
```
And clean up some comments that talked about persistent tables.
```
fe12fd8c

Remove remnants of persistent tables. · 334476ad

由 Heikki Linnakangas 提交于 12月 15, 2017

They were not kept up-to-date anymore anyway. Remove the actual tables.

There are still a few references to these tables in the management tools.
AFAICS they're in tests, and I was hesitent to remove them just yet, in
case we're going to use the existing tests as a guide when writing new
tests.

334476ad

Work around the fact that nothing might a flush WAL, in gp_replica_check. · 824b6d96

由 Heikki Linnakangas 提交于 12月 18, 2017

gp_replica_check would often get stuck, waiting for the standby to apply all
the WAL it was sent. However, there is nothing to force a WAL flush in the
master. Usually, the last record in a transaction is a transaction commit,
which is flushed, and many other things cause a WAL flush too, but when
running the regression suite, often the last WAL record is a WAL-logged
hint bit update, just after a checkpoint.

To work around that, if the standby doesn't catch up in 20 seconds, issue
a CHECKPOINT in the master, to force a WAL flush. Something more
lightweight could be used to flush the WAL, but gp_replica_check needs the
data on disk to be up to date, so a checkpoint seems like a good idea.
In fact, perhaps we should always issue a CHECKPOINT, even before the first
attempt. Currently the python script does that, but now it seems redundant..

824b6d96

Fix deletion of AOCO tables. · 4998a5e9

由 Heikki Linnakangas 提交于 12月 18, 2017

An AOCO table doesn't have a '0' segfile at all. Therefore, using
smgrexists() to check if a relation exists on disk does not work.

4998a5e9

Remove some tests related to persistent tables · bd9c2109

由 Taylor Vesely 提交于 12月 15, 2017

Now that we are removing the persistent tables, these tests no longer make sense.

Author: Taylor Vesely <tvesely@pivotal.io>
Author: Ashwin Agrawal <aagrawal@pivotal.io>

bd9c2109

A
Update answer file as now extra xlog record is generated. · 87e21448
由 Ashwin Agrawal 提交于 12月 15, 2017
```
Since AO/CO file creation generates xlog record, update answer file.
```
87e21448
H
Fix deletion of AO and AOCS tables, to remove all segments. · 175c25e8
由 Heikki Linnakangas 提交于 12月 15, 2017
```
This hopefully fixes the gp_replica_check failures we're seeing in the
pipeline.
```
175c25e8

WAL-log creation of empty AO segfiles. · 1eafc22f

由 Heikki Linnakangas 提交于 12月 15, 2017

An empty segfile is mostly treated the same as a missing segfile, but for
the sake of gp_replica_check, WAL-log the creation of an empty segfile
anyway, so that there is no inconsistency between master and mirror, such
that an empty segfile exists on master, but it's missing entirely in the
mirror. (I'm not entirely sure if there is non-testing code that requires
that, too, so better safe than sorry).

This should fix the warnings like this:

WARNING: Unable to open file /tmp/build/e18b2f02/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/base/16384/61117.1152

from gp_replica_check. (There are other failures still.)

1eafc22f

Report WAL apply position in a more sensible way at page boundaries. · d137443d

由 Heikki Linnakangas 提交于 12月 15, 2017

If there is some unused space at the end of a WAL page, because we never
split WAL record header, the WAL receiver's flush and apply positions were
reported a bit funnily. The flush position would report the end of the page,
including the unused padding, while the apply position would only go up to
the end of last WAL record on the page, excluding the padding page. If you
compare flush == apply positions, it would look as if not all of the WAL
had been applied yet, even though the difference between the pointers was
just the unused padding space.

This will get fixed in PostgreSQL 9.3, where the padding at end of WAL page
is eliminated, but until then, tweak the reporting of the apply position to
also include any end-of-page padding. That makes the flush == apply
comparison a valid way to check if all the flushed WAL has been applied,
even at page boundaries.

I believe this explains the "unable to obtain start synced LSN values
between primary and mirror" failures we've been seeing from the
gp_replica_check test. gp_replica_check waits for apply == flush, and if
the last WAL record lands at a page boundary, that condition never became
true because of the padding. (Although I'm not sure why it used to work
earlier, or did it?)

d137443d

H

Also remove 'duplicate_persistent' gpcheckcat test from the list in makefile. · 19a85dbb
由 Heikki Linnakangas 提交于 12月 15, 2017

19a85dbb

Remove checks related to persistent tables from gpcheckcat. · 126a7935

由 Heikki Linnakangas 提交于 12月 15, 2017

Because persistent tables are no more.

NOTE: It would still be nice to check for consistency between pg_class
and files on disk, to check that there are no extra data files, and no
data files missing that have a pg_class entry. Same with AO seg files,
I suppose. But that's a significantly different query than what we have
here.

126a7935

Properly delete relation files on commit record replay. · d4e88eb6

由 Ashwin Agrawal 提交于 12月 14, 2017

Resolving the GPDB_84_MERGE_FIXME now, that we match close to upstream. Without
thsi fix the relation files were not dropped during recovery or replay on
mirrors.

d4e88eb6

Implicitly create AO segfile in WAL replay, on first insertion to it. · f3e46dd6

由 Heikki Linnakangas 提交于 12月 15, 2017

Because it's no longer created by the MMXLOG records.

Alternatively, we could have a separate WAL record type for the creation.
But this will do for now.

f3e46dd6

Fixes for AO WAL-logging. · 40fd8cac

由 Heikki Linnakangas 提交于 12月 15, 2017

* Need to set relFileNode field correctly in MirroredAppendOnlyOpen, along
  with the File descriptor itself. Otherwise the relfilenode is set
  incorrectly in WAL records.

* Pretend that filespace location is always "tblspc_dummy_<tablespace oid>".
  The filespace/tablespace stuff is quite broken ATM, but hopefully this
  at least avoids some crashing.

40fd8cac

Fix commit_transaction_block_checkpoint test. · f8112bde

由 Heikki Linnakangas 提交于 12月 14, 2017

The fault injection points used in the test didn't exist anymore. Add a new
injection point in RecordTransactionCommit(), just before writing the commit
WAL record, and use that in the test.

Remove a bunch of fault injection IDs that are no longer used. (They are
still referenced in some TINC tests, but the injection points don't exist
anymore, so those tests will need to be rewritten if we want to keep them.)

f8112bde