1. 13 1月, 2018 40 次提交
    • H
      Make newly-added gp_bloat_diag test less sensitive. · c91e320c
      Heikki Linnakangas 提交于
      The test was sensitive to the number of pages in the pg_rewrite system
      table's index, for no good reason. Also, don't create a new database for
      it, to speed it up.
      c91e320c
    • H
      Remove LWLockWaitCancel(). · 67795819
      Heikki Linnakangas 提交于
      It was only used by the filerep code. Now that that's gone, this was just
      dead code.
      67795819
    • J
      Remove dead/unreferenced db state code (#4225) · 145eb5d6
      Jacob Champion 提交于
      The DB_IN_STANDBY_NEW_TLI_SET state doesn't really seem to do anything
      anymore, as of commit 813b817cc. Remove it entirely to get rid of an
      assertion during standby tests. Also remove multipass function
      declarations; they're gone.
      145eb5d6
    • A
      Disable new added 2PC test as failing in CI. · 6494b02c
      Ashwin Agrawal 提交于
      This test passed locally, also in PR pipeline and forked pipeline multiple
      times, but intermittently faling in main CI pipeline hence disabling the
      same. The failure is happening when master is connecting to segments first time
      after PANIC and its failing with
      
      ```
      +LOG:  could not connect to segment: initialization of segworker group failed (cdbgang.c:235)
      +LOG:  could not connect to segment: initialization of segworker group failed (cdbgang.c:235)
      
      2018-01-02 02:44:16.225927 UTC,"gpadmin","isolation2test",p33565,th-1615808736,"[local]",,2018-01-02 02:44:16 UTC,0,con640,,seg-1,,,,sx1,"FATAL","XX000","DTM initialization: failure during startup recovery, retry failed, check segment status (cdbtm.c:1537)",,"Process 33565 will wait for gp_debug_linger=120 seconds before termination.
      Note that its locks and other resources will not be released until then.",,,,,0,,"cdbtm.c",1537,"Stack trace:
      1    0x9c1afb postgres errstart + 0x1db
      2    0x9c3ca9 postgres elog_finish + 0xb9
      3    0xadedea postgres initTM (cdbtm.c:1536)
      4    0x9dac77 postgres InitPostgres + 0x857
      5    0x8867a7 postgres PostgresMain + 0x207
      6    0x81c97d postgres <symbol not found> (postmaster.c:0)
      7    0x81eea2 postgres PostmasterMain + 0xc42
      8    0x73f0a1 postgres main (main.c:206)
      9    0x7f909b4fbd1d libc.so.6 __libc_start_main + 0xfd
      10   0x4bf2b5 postgres <symbol not found> + 0x4bf2b5
      
      ```
      
      Investigating and will renable this newly added test once able to find why first
      connection is failing from master.
      6494b02c
    • J
      Revert "cs_walrep_1: disable gpactivatestandby tests for now" · c28fd064
      Jacob Champion 提交于
      Unfortunately the cluster crashes anyway two tests later. Rather than
      comment out half the tests to get a fake green, put this set of tests
      back. We'll just have to solve this one problem at a time.
      
      This reverts commit 5982a72614492916187ca27fc660d7cc7e3b69e1.
      c28fd064
    • J
      cs_walrep_1: disable gpactivatestandby tests for now · 1384a094
      Jacob Champion 提交于
      The promotion logic that gpactivatestandby relies on doesn't work yet,
      and when these tests fail, they leave the cluster completely unusable.
      1384a094
    • A
      Rewrite a 2PC test in isolation2. · c20ac186
      Ashwin Agrawal 提交于
      This test in TINC is very shaky, as brings down primary and mirror and hence
      affects gp_segment_configuration.
      
      Test intends to fail broadcasting of COMMIT PREPARED to one segment and hence
      trigger PANIC in master while after completing phase 2 of 2PC. Master's recovery
      cycle should correctly broadcast COMMIT PREPARED again because master should
      find distributed commit record in its xlog during recovery. Verify that the
      transaction is committed after recovery. This scenario used to create cluster
      inconsistency due to bug fixed now, as transaction used to get committed on all
      segments except one where COMMIT PREPARED broadcast failed before
      recovery. Master used to miss sending the COMMIT PREPARED across restart and
      instead abort the transaction after querying in-doubt prepared transactions from
      segments.
      c20ac186
    • A
      Add retry in isolation2 test framework for database restart. · b26fe4eb
      Ashwin Agrawal 提交于
      To support writing tests where session can cause PANIC of master, add retry
      logic while establishing connection in isolation2. This helps to keep the tests
      simple.
      b26fe4eb
    • A
      9586ea99
    • A
      Add retries using grace period for declaring mirror down. · cd647b1f
      Ashwin Agrawal 提交于
      If fts detects primary as down, it retries n times before marking it down. But
      mirror gets marked as down if connection to primary has not been made or
      lost. This surfaced as problem mostly during cluster start (gpstart), where
      sequence is to start primary and mirror followed by master. In many instances
      when master probed primary, mirror connection was yet to be made and hence up
      mirror in configuration unnecessarily got marked down, if if just few secs latr
      mirror established connection to primary.
      
      So, to avoid such sitations plus make it little resilient against minor network
      glitches, adding variable to record when initialization or disconnection
      happened. Using the same on fts probe find now can find how long mirror didn't
      showed-up. Only if mirror didn't show-up for allowed period (30 secs) for now
      report it was down, else request fts to retry the probe. This logic doesn't
      affect regular flow also avoids any waiting in utilties for specific states
      after cluster restart.
      cd647b1f
    • A
      gpstop -u should not specifically check for "No such process" · bb0b9cb5
      Ashwin Agrawal 提交于
      If postmaster.pid file is present, reload will get error as "No such
      process". But if postmaster.pid is not present then error returned back is
      "pg_ctl: PID file "......../postmaster.pid" does not exist". So, its better not
      to check for any particular error message but report segmnes failed to be
      reloaded.
      bb0b9cb5
    • A
      Restore logic to skip databases cannot connect for oldest database. · 52884f32
      Ashwin Agrawal 提交于
      GPDB skips databases that cannot be connected to in computing the oldest
      database in vac_truncate_clog(). Make write_database_file() same, which was
      reverted to upstream version. This helps to get the storage tests green for now.
      
      Later can figure out and uniformly remove this code from vac_truncate_clog() and
      write_database_file() if better solution is found to original issue for which
      this check was added.
      52884f32
    • A
      6554324b
    • A
      Add walrep specific states to gparray.py. · f812f6d6
      Ashwin Agrawal 提交于
      With walrep we have new states 'n' not in sync. So, adding valid states
      corresponding to it to let some tests pass. Lot more cleanup needs to happen of
      this area to remove filerep specific states but that's work for different
      commit.
      f812f6d6
    • A
      Remove tests for cross check between gp_relation_node and pg_aoseg · d97dfc1e
      Ashwin Agrawal 提交于
      Since now gp_relation_node table no more exists, no point testing if ERROR is
      reported if pg_aoseg and gp_relation_node are not in sync.
      d97dfc1e
    • A
      Delete duplicate_entries tests as its specific to filerep. · 629b7b3f
      Ashwin Agrawal 提交于
      The test is specific to filerep behavior whene truncate was not properly
      resynced, causing the problem.
      629b7b3f
    • A
      Fix expected return code for mm_gpcheckcat. · 40ba2864
      Ashwin Agrawal 提交于
      40ba2864
    • A
      Force FTS scan after stopping the mirror that failed to recover incrementally. · cc8240a9
      Asim R P 提交于
      FTS scan marks the stopped mirror as down so that subsequent recoverfull works.
      cc8240a9
    • A
      Remove filerep specific tests from Storage suite. · 2aa2e656
      Ashwin Agrawal 提交于
      This removes the make target storage_filerep.
      2aa2e656
    • A
      Delete test_AOCOAlterColumnChangeTracking. · c56c93a7
      Ashwin Agrawal 提交于
      This test is not more relevant with wal replication. This should get
      `aocoalter_catalog_loaders` task in Storage schedule green.
      c56c93a7
    • A
      Start running segspace test for wal replication. · acec2c16
      Ashwin Agrawal 提交于
      Now that gpstop/gpstart works for wal replication, remove segspace from
      --exclude-tests. filespace is only one remains in --exclude-tests list which
      would go away soon as well.
      acec2c16
    • J
      Make gpstart work for walrep mirrors · 2b5720bf
      Jimmy Yih 提交于
      All that was needed was to make sure mirrors are not started with
      pg_ctl -w flag since the mirror is in recovery mode and will not
      respond to PQPing messages.
      
      Author: Jimmy Yih <jyih@pivotal.io>
      Author: Marbin Tan <mtan@pivotal.io>
      2b5720bf
    • J
      gpinitsystem with walrep mirrors instead of filerep mirrors · ce4d96b6
      Jimmy Yih 提交于
      With file replication gone, gpinitsystem should no longer try to
      initialize the cluster through filerep sequence.
      
      The sequence now goes as follows:
      1. Create and start master in master-only mode
      2. Create primaries and register to master
      3. Stop master.
      4. Run gpstart to start master and primaries.
      5. Create mirrors w/ pg_basebackup and register to master.
      6. Start the mirrors and wait until primaries and mirrors sync.
      
      Author: Jimmy Yih <jyih@pivotal.io>
      Author: Marbin Tan <mtan@pivotal.io>
      ce4d96b6
    • H
      Try to avoid race condition in test, when querying pg_partitions. · 3b74382b
      Heikki Linnakangas 提交于
      pg_partitions contains calls to pg_get_expr() function. That function
      suffers from a race condition: If the relation is dropped between the
      get_rel_name() call, and another syscache lookup in pg_get_expr_worker(),
      you get a "relation not found" error. The error message is reasonable,
      and I don't see any easy fix for the pg_partitions view itself, so just
      try to avoid hitting that in the tests.
      
      For some reason we are hitting that frequently in this particular query.
      Change it to query pg_class instead, it doesn't use any of the more
      complicated fields from pg_partitions, anyway.
      
      I'm pushing this to the 'walreplication' branch first, because for some
      reason, we're seeing the failure there more often than on 'master'. If
      this fixes the problem, I'll push this to 'master', too.
      3b74382b
    • H
      Remove remnants of multi-pass startup. · 69578765
      Heikki Linnakangas 提交于
      69578765
    • M
      Shutdown standby master to make walrep related test pass. · 40bc027a
      Max Yang 提交于
      Author: Max Yang <myang@pivotal.io>
      Author: Xiaoran Wang <xiwang@pivotal.io>
      40bc027a
    • M
      Add answer file for restart_standup case · 2943e824
      Max Yang 提交于
      Author: Max Yang <myang@pivotal.io>
      Author: Xiaoran Wang <xiwang@pivotal.io>
      2943e824
    • M
      Fix walrep test case failure. · a1990a80
      Max Yang 提交于
      Currently we start standby master when WITH_MIRROS=true. Which
      will make fake wal receiver error out:
      number of requested standby connections exceeds max_wal_senders (currently 1)
      Because standby master already use one wal_sender.
      To make test pass, we remove standby master at the beginning of this test
      and recover it at the end of test.
      A better solution maybe change this value to be configurable at startup time.
      But this is just a simple fix for passing.
      
      Author: Max Yang <myang@pivotal.io>
      Author: Xiaoran Wang <xiwang@pivotal.io>
      a1990a80
    • M
      Fix bgwriter_checkpoint test case · 5aa2f649
      Max Yang 提交于
      Since we start standby master if WITH_MIRRORS=true. The element number
      in gp_segment_configuration changes, and result in change of answer file
      
      Author: Max Yang <myang@pivotal.io>
      Author: Xiaoran Wang <xiwang@pivotal.io>
      5aa2f649
    • M
      Start standby master in create-demo-cluster when WITH_MIRRORS = true. · 0c7f1281
      Max Yang 提交于
      Author: Max Yang <myang@pivotal.io>
      Author: Xiaoran Wang <xiwang@pivotal.io>
      0c7f1281
    • A
      gpaddmirrors: fix unit tests · 5cc2ddd0
      Asim R P 提交于
      The last commit removed the replication ports (replacing them with -1 in
      the Python utilities), and those numbers were being checked as part of
      this test. Comment the checks out and tag with FIXMEs.
      
      Author: Asim R P <apraveen@pivotal.io>
      Author: Jacob Champion <pchampion@pivotal.io>
      5cc2ddd0
    • H
      Quick fix to make gpstart work. · cadf63a8
      Heikki Linnakangas 提交于
      At least on with gpdemo, on my laptop.
      
      We really shouldn't need these filerep port numbers anymore, right?
      cadf63a8
    • H
      Remove unused gp_initdb_mirrored variable. · c50fa05d
      Heikki Linnakangas 提交于
      And the mechanism in initdb and gpinitsystem to set it. It's no longer
      used for anything.
      c50fa05d
    • H
      Remove leftover LWLocks that are now unused. · 8a46029f
      Heikki Linnakangas 提交于
      8a46029f
    • H
      Remove GUCs and fault injection points related to PT and filerep. · e8dc97d4
      Heikki Linnakangas 提交于
      These were left over when Persistent Tables and Filerep were removed.
      e8dc97d4
    • H
      Remove cdbmirroredappendonly.[ch]. · 334d41a8
      Heikki Linnakangas 提交于
      What was left of it, was a very thin and leaky abstraction, plus WAL-logging
      functions. Move the WAL-logging functions to a new file called
      cdbappendonlyxlog.c, and dismantle the MirroredAppendOnlyOpen abstraction.
      334d41a8
    • H
      Add more robust retry logic to gp_replica_check, so that it can be run online. · f6d42b45
      Heikki Linnakangas 提交于
      Instead of waiting for the primary and mirror to have the exact same LSN,
      add logic to retry the file comparisons a few times if there are any
      differences. This is a natural continuation of the earlier retry-loops I
      added there, but now the LSN checks are made so that we don't even expect
      the primary and mirror to sync on a particular value, and we retry not
      while trying to sync the LSNs, but during the comparison itself.
      
      This makes it possible to run gp_replica_check on a running cluster, while
      modifying tables. (The extra checkpoints it emits will have a performance
      impact on the other queries, though)I tested this by running pgbench at the
      same time. You'll get a few NOTICEs about mismatches, but those are
      harmless. After a few automatic retries, it eventually passes.
      f6d42b45
    • H
      Remove MirroredAppendOnly_Truncate() function. · f882ee40
      Heikki Linnakangas 提交于
      Might as well call FileTruncate directly.
      f882ee40
    • H
      Remove unused fields, and README. · e95a8ada
      Heikki Linnakangas 提交于
      e95a8ada
    • H