1. 13 1月, 2018 40 次提交
    • M
      Fix walrep test case failure. · a1990a80
      Max Yang 提交于
      Currently we start standby master when WITH_MIRROS=true. Which
      will make fake wal receiver error out:
      number of requested standby connections exceeds max_wal_senders (currently 1)
      Because standby master already use one wal_sender.
      To make test pass, we remove standby master at the beginning of this test
      and recover it at the end of test.
      A better solution maybe change this value to be configurable at startup time.
      But this is just a simple fix for passing.
      
      Author: Max Yang <myang@pivotal.io>
      Author: Xiaoran Wang <xiwang@pivotal.io>
      a1990a80
    • M
      Fix bgwriter_checkpoint test case · 5aa2f649
      Max Yang 提交于
      Since we start standby master if WITH_MIRRORS=true. The element number
      in gp_segment_configuration changes, and result in change of answer file
      
      Author: Max Yang <myang@pivotal.io>
      Author: Xiaoran Wang <xiwang@pivotal.io>
      5aa2f649
    • M
      Start standby master in create-demo-cluster when WITH_MIRRORS = true. · 0c7f1281
      Max Yang 提交于
      Author: Max Yang <myang@pivotal.io>
      Author: Xiaoran Wang <xiwang@pivotal.io>
      0c7f1281
    • A
      gpaddmirrors: fix unit tests · 5cc2ddd0
      Asim R P 提交于
      The last commit removed the replication ports (replacing them with -1 in
      the Python utilities), and those numbers were being checked as part of
      this test. Comment the checks out and tag with FIXMEs.
      
      Author: Asim R P <apraveen@pivotal.io>
      Author: Jacob Champion <pchampion@pivotal.io>
      5cc2ddd0
    • H
      Quick fix to make gpstart work. · cadf63a8
      Heikki Linnakangas 提交于
      At least on with gpdemo, on my laptop.
      
      We really shouldn't need these filerep port numbers anymore, right?
      cadf63a8
    • H
      Remove unused gp_initdb_mirrored variable. · c50fa05d
      Heikki Linnakangas 提交于
      And the mechanism in initdb and gpinitsystem to set it. It's no longer
      used for anything.
      c50fa05d
    • H
      Remove leftover LWLocks that are now unused. · 8a46029f
      Heikki Linnakangas 提交于
      8a46029f
    • H
      Remove GUCs and fault injection points related to PT and filerep. · e8dc97d4
      Heikki Linnakangas 提交于
      These were left over when Persistent Tables and Filerep were removed.
      e8dc97d4
    • H
      Remove cdbmirroredappendonly.[ch]. · 334d41a8
      Heikki Linnakangas 提交于
      What was left of it, was a very thin and leaky abstraction, plus WAL-logging
      functions. Move the WAL-logging functions to a new file called
      cdbappendonlyxlog.c, and dismantle the MirroredAppendOnlyOpen abstraction.
      334d41a8
    • H
      Add more robust retry logic to gp_replica_check, so that it can be run online. · f6d42b45
      Heikki Linnakangas 提交于
      Instead of waiting for the primary and mirror to have the exact same LSN,
      add logic to retry the file comparisons a few times if there are any
      differences. This is a natural continuation of the earlier retry-loops I
      added there, but now the LSN checks are made so that we don't even expect
      the primary and mirror to sync on a particular value, and we retry not
      while trying to sync the LSNs, but during the comparison itself.
      
      This makes it possible to run gp_replica_check on a running cluster, while
      modifying tables. (The extra checkpoints it emits will have a performance
      impact on the other queries, though)I tested this by running pgbench at the
      same time. You'll get a few NOTICEs about mismatches, but those are
      harmless. After a few automatic retries, it eventually passes.
      f6d42b45
    • H
      Remove MirroredAppendOnly_Truncate() function. · f882ee40
      Heikki Linnakangas 提交于
      Might as well call FileTruncate directly.
      f882ee40
    • H
      Remove unused fields, and README. · e95a8ada
      Heikki Linnakangas 提交于
      e95a8ada
    • H
    • H
      Remove mirrored flatfile stuff. · 7a249d7e
      Heikki Linnakangas 提交于
      Revert the code to open/read/write regular files, to the way it's in the
      upstream.
      7a249d7e
    • H
      Remove gp_global_sequence tables. · f982fabe
      Heikki Linnakangas 提交于
      It's now unused.
      f982fabe
    • H
      It's not an automatic fail, if some WAL records were created during test. · 00ea0b87
      Heikki Linnakangas 提交于
      WAL could be created e.g. by checkpoints, or some background activity that
      sets hint bits. Such activity might cause a failure, if a data file is
      modified in the master, but the change has not been replayed in the standby
      yet. But just because it can make our check to fail, doesn't mean we need
      to treat it as an automatic failure. Keep the warning, but consider the test
      as a success, if the check itself found nothing wrong.
      00ea0b87
    • H
      In gp_replica_check, wait for checkpoint to finish. · 189ca232
      Heikki Linnakangas 提交于
      I removed the CHECKPOINT calls from the python script yesterday, replacing
      them with RequestCheckpoint() in the UDF itself. But I didn't use the
      CHECKPOINT_WAIT flag, so it might go ahead with the checking before the
      checkpoint has run. That might explain the gp_replica_check failures we're
      seeing in the pipeline now.
      189ca232
    • H
      Add the same work-around to getting a "synced" LSN after checks, as before. · f21ecf52
      Heikki Linnakangas 提交于
      Move the checkpoint-retry logic to within get_synced_lsns(), so that it
      applies to the synced LSN we get after running all the checks, too. When
      I added the retry logic to the get_synced_lsns() call before the checks,
      I didn't realize that there's a second call after the checks.
      
      This hopefully fixes the "WARNING:  unable to obtain end synced LSN values
      between primary and mirror" messages we're still occasionally seeing
      in the pipeline.
      f21ecf52
    • H
      f3f4c8fa
    • H
      Also remove references to enable_segwalrep in Makefiles. · 1697a640
      Heikki Linnakangas 提交于
      I removed the autoconf flag and #ifdefs earlier, but missed these.
      1697a640
    • H
      Remove some stray references to MirroredLock · 953a358f
      Heikki Linnakangas 提交于
      953a358f
    • H
      Remove --disable-segwalrep option, and the #ifdefs. · db7e4020
      Heikki Linnakangas 提交于
      WAL replication is the name of the game on this branch.
      db7e4020
    • H
      Fix two more unit tests. · 49e448be
      Heikki Linnakangas 提交于
      49e448be
    • H
      Fix unit test. · db9c4889
      Heikki Linnakangas 提交于
      db9c4889
    • H
    • H
      Remove some code that was left unused earlier. · fe12fd8c
      Heikki Linnakangas 提交于
      And clean up some comments that talked about persistent tables.
      fe12fd8c
    • H
      Remove remnants of persistent tables. · 334476ad
      Heikki Linnakangas 提交于
      They were not kept up-to-date anymore anyway. Remove the actual tables.
      
      There are still a few references to these tables in the management tools.
      AFAICS they're in tests, and I was hesitent to remove them just yet, in
      case we're going to use the existing tests as a guide when writing new
      tests.
      334476ad
    • H
      Work around the fact that nothing might a flush WAL, in gp_replica_check. · 824b6d96
      Heikki Linnakangas 提交于
      gp_replica_check would often get stuck, waiting for the standby to apply all
      the WAL it was sent. However, there is nothing to force a WAL flush in the
      master. Usually, the last record in a transaction is a transaction commit,
      which is flushed, and many other things cause a WAL flush too, but when
      running the regression suite, often the last WAL record is a WAL-logged
      hint bit update, just after a checkpoint.
      
      To work around that, if the standby doesn't catch up in 20 seconds, issue
      a CHECKPOINT in the master, to force a WAL flush. Something more
      lightweight could be used to flush the WAL, but gp_replica_check needs the
      data on disk to be up to date, so a checkpoint seems like a good idea.
      In fact, perhaps we should always issue a CHECKPOINT, even before the first
      attempt. Currently the python script does that, but now it seems redundant..
      824b6d96
    • H
      Fix deletion of AOCO tables. · 4998a5e9
      Heikki Linnakangas 提交于
      An AOCO table doesn't have a '0' segfile at all. Therefore, using
      smgrexists() to check if a relation exists on disk does not work.
      4998a5e9
    • T
      Remove some tests related to persistent tables · bd9c2109
      Taylor Vesely 提交于
      Now that we are removing the persistent tables, these tests no longer make sense.
      
      Author: Taylor Vesely <tvesely@pivotal.io>
      Author: Ashwin Agrawal <aagrawal@pivotal.io>
      bd9c2109
    • A
      Update answer file as now extra xlog record is generated. · 87e21448
      Ashwin Agrawal 提交于
      Since AO/CO file creation generates xlog record, update answer file.
      87e21448
    • H
      Fix deletion of AO and AOCS tables, to remove all segments. · 175c25e8
      Heikki Linnakangas 提交于
      This hopefully fixes the gp_replica_check failures we're seeing in the
      pipeline.
      175c25e8
    • H
      WAL-log creation of empty AO segfiles. · 1eafc22f
      Heikki Linnakangas 提交于
      An empty segfile is mostly treated the same as a missing segfile, but for
      the sake of gp_replica_check, WAL-log the creation of an empty segfile
      anyway, so that there is no inconsistency between master and mirror, such
      that an empty segfile exists on master, but it's missing entirely in the
      mirror. (I'm not entirely sure if there is non-testing code that requires
      that, too, so better safe than sorry).
      
      This should fix the warnings like this:
      
      WARNING:  Unable to open file /tmp/build/e18b2f02/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1/base/16384/61117.1152
      
      from gp_replica_check. (There are other failures still.)
      1eafc22f
    • H
      Report WAL apply position in a more sensible way at page boundaries. · d137443d
      Heikki Linnakangas 提交于
      If there is some unused space at the end of a WAL page, because we never
      split WAL record header, the WAL receiver's flush and apply positions were
      reported a bit funnily. The flush position would report the end of the page,
      including the unused padding, while the apply position would only go up to
      the end of last WAL record on the page, excluding the padding page. If you
      compare flush == apply positions, it would look as if not all of the WAL
      had been applied yet, even though the difference between the pointers was
      just the unused padding space.
      
      This will get fixed in PostgreSQL 9.3, where the padding at end of WAL page
      is eliminated, but until then, tweak the reporting of the apply position to
      also include any end-of-page padding. That makes the flush == apply
      comparison a valid way to check if all the flushed WAL has been applied,
      even at page boundaries.
      
      I believe this explains the "unable to obtain start synced LSN values
      between primary and mirror" failures we've been seeing from the
      gp_replica_check test. gp_replica_check waits for apply == flush, and if
      the last WAL record lands at a page boundary, that condition never became
      true because of the padding. (Although I'm not sure why it used to work
      earlier, or did it?)
      d137443d
    • H
    • H
      Remove checks related to persistent tables from gpcheckcat. · 126a7935
      Heikki Linnakangas 提交于
      Because persistent tables are no more.
      
      NOTE: It would still be nice to check for consistency between pg_class
      and files on disk, to check that there are no extra data files, and no
      data files missing that have a pg_class entry. Same with AO seg files,
      I suppose. But that's a significantly different query than what we have
      here.
      126a7935
    • A
      Properly delete relation files on commit record replay. · d4e88eb6
      Ashwin Agrawal 提交于
      Resolving the GPDB_84_MERGE_FIXME now, that we match close to upstream. Without
      thsi fix the relation files were not dropped during recovery or replay on
      mirrors.
      d4e88eb6
    • H
      Implicitly create AO segfile in WAL replay, on first insertion to it. · f3e46dd6
      Heikki Linnakangas 提交于
      Because it's no longer created by the MMXLOG records.
      
      Alternatively, we could have a separate WAL record type for the creation.
      But this will do for now.
      f3e46dd6
    • H
      Fixes for AO WAL-logging. · 40fd8cac
      Heikki Linnakangas 提交于
      * Need to set relFileNode field correctly in MirroredAppendOnlyOpen, along
        with the File descriptor itself. Otherwise the relfilenode is set
        incorrectly in WAL records.
      
      * Pretend that filespace location is always "tblspc_dummy_<tablespace oid>".
        The filespace/tablespace stuff is quite broken ATM, but hopefully this
        at least avoids some crashing.
      40fd8cac
    • H
      Fix commit_transaction_block_checkpoint test. · f8112bde
      Heikki Linnakangas 提交于
      The fault injection points used in the test didn't exist anymore. Add a new
      injection point in RecordTransactionCommit(), just before writing the commit
      WAL record, and use that in the test.
      
      Remove a bunch of fault injection IDs that are no longer used. (They are
      still referenced in some TINC tests, but the injection points don't exist
      anymore, so those tests will need to be rewritten if we want to keep them.)
      f8112bde