1. 13 8月, 2019 1 次提交
    • I
      Implement TwoPhaseFileHeader details output in pg_xlogdump · b74f174b
      Ivan Leskin 提交于
      Output the following in pg_xlogdump for prepare transaction records:
      * Global transaction ID
      * Prepare time
      * Tablespace OIDs:
      	* 'tablespace_oid_to_delete_on_abort'
      	* 'tablespace_oid_to_delete_on_commit'
      
      These data come from TwoPhaseFileHeader structure which is written in
      xlog in GPDB.
      
      Move TwoPhaseFileHeader to `twophase.h` in order for that structure to
      be used by `xact_desc_prepare()` in `xactdesc.c`
      Co-Authored-By: NDavid Kimura <dkimura@pivotal.io>
      b74f174b
  2. 02 8月, 2019 1 次提交
  3. 06 7月, 2019 1 次提交
    • S
      Refactor making databases 2PC and correct ALTER DATABASE SET TABLESPACE · b0208894
      Soumyadeep Chakraborty 提交于
      High-level details:
      1) Tear out existing 2PC infrastructure for databases
      2) Introduce new structures (in-mem, on-disk) for making databases 2PC
      3) Make sure ALTER DATABASE SET TABLESPACE (ADST) works correctly such
      that no orphaned files remain in the source/target tablespace
      directories in cases of commit/abort.
      4) We add the dboid dir to pendingDbDeletes for abort after copying
      dboid dir in movedb. This ensures that the dboid dir is cleaned up from
      the target tablespace when we have a failure during the course of
      execution of the ALTER.
      5) Re-engineer tests to assert filesystem changes and accommodate
      multiple failure scenarios during the course of a 2PC commit/abort.
      6) Introduce a new generic mechanism for making mirrors catch up. This
      is done by inserting a XLOG_XACT_NOOP record and then waiting for the
      mirror to replay the record with gp_inject_fault2. The premise is that
      if the mirror has replayed this latest record, it would have replayed
      everything before the NOOP record. We have introduced a UDF and a fault
      injection point to make this possible.
      7) Update pg_xlogdump to dump out pending DB deletes on commit/abort.
      
      Refactoring Notes:
      1) The xlog insert for database drops in `DropDatabaseDirectory`
      is no longer required since we are already WALing database drops
      elsewhere.
      2) `xact_get_distributed_info_from_commit` populated variables that were
      useful in the context of its only caller `xact_redo_distributed_commit`.
      If we keep it around there would be unnecessary code duplication -> thus
      deleting it.
      3) Use DropRelationFiles() instead of duplicating logic in
      xact_redo_distributed_commit.
      4) Refactor session-level lock release for the QD during movedb().
      5) smgr is a layer that pertains only to relations. So, we drop the
      smgr prefix for pending db delete functions.
      6) DropDatabaseDirectories() pertains to dropping dboid dirs at the
      filesystem level. This means that it belongs inside storage.h/c.
      DropDatabaseDirectories forms the visible interface as contrasted to
      DropDatabaseDirectory. So moved the latter and made it static.
      7) Rename DatabaseDropStorage to ScheduleDbDirDelete to call out its
      intent.
      8) Extract pending db deletes and dboid dir removal to dedicated file
      9) Tests for scenarios with and without faults now live in one
      consolidated file.
      Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
      Co-authored-by: NAdam Berlin <aberlin@pivotal.io>
      Co-authored-by: NJesse Zhang <jzhang@pivotal.io>
      b0208894
  4. 29 6月, 2019 1 次提交
    • D
      Remove XLOG_XACT_ONE_PHASE_COMMIT xlog record · 4be677c7
      David Kimura 提交于
      Issue is that in Postgres 9.5 there is a fixed amount of xl_info flags
      that can be used. GPDB commit b5871009 added another flag which in
      GPDB 9.5 merge branch exceeds the amount available.
      
      Ashwin noticed the difference between XLOG_XACT_ONE_PHASE_COMMIT and
      XLOG_XACT_COMMIT records is if distributed info is set. Instead we can
      use a same record type in both cases if we setup distributed info in
      RecordTransactionCommit() and then pass the distributed info through
      xact_redo().
      4be677c7
  5. 21 6月, 2019 1 次提交
    • X
      Implement one-phase optimization · b5871009
      xiong-gang 提交于
      If a transaction has only updated one QE, we can do one-phase commit
      there.
      
      If one-phase commit transactions don't write pg_distributedlog, the tuples'
      visibility will be checked only with the local snapshot. This will result in an 
      incorrect result in repeatable read isolation level.
      
      For example:
      create table t(a int);
      tx 1: BEGIN ISOLATION LEVEL REPEATABLE READ;
      tx 2: insert into t values(1);
      tx 1: select * from t where a = 1;
      tx 2: insert into t values(1);
      tx 2: insert into t values(2);
      tx 1: select * from t;
      
      The first SELECT of tx1 will create a distributed snapshot on QD and a local
      snapshot on segment 1, and the later SELECT of tx1 will create a local snapshot
      on segment 2. In this way, the later SELECT sees the first and the third tuple
      but not the second one.
      Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
      Co-authored-by: NHubert Zhang <hzhang@pivotal.io>
      Co-authored-by: NGang Xiong <gxiong@pivotal.io>
      b5871009
  6. 20 6月, 2019 1 次提交
    • S
      Ensure temp relation files are removed from the file-system on a DROP. · b5770c13
      Soumyadeep Chakraborty 提交于
      This fixes issue:#7922. For a DROP on a relation, after the COMMIT
      PREPARED record is written on the QE, we remove the relation file from
      disk (call to DropRelationFiles() inside FinishPreparedTransaction()).
      While doing so, we need to know whether the relation is a temp relation
      such that the file deletion code can prepend the 't_' prefix to the
      relfilenode name.  Apart from the case above, whenever we have code
      dropping temporary relation files as a part of the 2PC mechanism, we
      have to ensure that we have enough information to construct the correct
      filesystem path to the temp relation's relfilenode.
      
      This commit ensures that we persist a flag specifying whether a relation
      is temporary through the 2PC infrastructure (a variety of XLOG records
      and in-mem pending deletes structure), in order to access it while
      performing the file removal inside DropRelationFiles.
      b5770c13
  7. 14 6月, 2019 1 次提交
    • J
      Print global transaction info for commit prepared WAL records · dc257e08
      Jimmy Yih 提交于
      When creating WAL records for a twophase commit, we add the global
      transaction info to distributed commit and distributed forget records
      on the query dispatcher and to the commit prepared records on the
      query executors. However, pg_xlogdump only prints this information for
      the distributed commit and distributed forget records. We should do
      the same for commit prepared records to more easily string together
      transactions between QD and QE when comparing xlog dumps.
      
      On query dispatcher:
      distributed commit 2019-06-11 15:16:17.786319 PDT gid = 1560291226-0000000006, gxid = 6
      distributed forget  gid = 1560291226-0000000006, gxid = 6
      
      On query executor:
      commit prepared 747: 2019-06-11 15:16:17.786660 PDT
      
      Change query executor output to:
      commit prepared 747: 2019-06-11 15:16:17.786660 PDT gid = 1560291226-0000000006 gxid = 6
      dc257e08
  8. 01 5月, 2014 1 次提交
    • T
      Rationalize common/relpath.[hc]. · 2d001904
      Tom Lane 提交于
      Commit a7301839 created rather a mess by
      putting dependencies on backend-only include files into include/common.
      We really shouldn't do that.  To clean it up:
      
      * Move TABLESPACE_VERSION_DIRECTORY back to its longtime home in
      catalog/catalog.h.  We won't consider this symbol part of the FE/BE API.
      
      * Push enum ForkNumber from relfilenode.h into relpath.h.  We'll consider
      relpath.h as the source of truth for fork numbers, since relpath.c was
      already partially serving that function, and anyway relfilenode.h was
      kind of a random place for that enum.
      
      * So, relfilenode.h now includes relpath.h rather than vice-versa.  This
      direction of dependency is fine.  (That allows most, but not quite all,
      of the existing explicit #includes of relpath.h to go away again.)
      
      * Push forkname_to_number from catalog.c to relpath.c, just to centralize
      fork number stuff a bit better.
      
      * Push GetDatabasePath from catalog.c to relpath.c; it was rather odd
      that the previous commit didn't keep this together with relpath().
      
      * To avoid needing relfilenode.h in common/, redefine the underlying
      function (now called GetRelationPath) as taking separate OID arguments,
      and make the APIs using RelFileNode or RelFileNodeBackend into macro
      wrappers.  (The macros have a potential multiple-eval risk, but none of
      the existing call sites have an issue with that; one of them had such a
      risk already anyway.)
      
      * Fix failure to follow the directions when "init" fork type was added;
      specifically, the errhint in forkname_to_number wasn't updated, and neither
      was the SGML documentation for pg_relation_size().
      
      * Fix tablespace-path-too-long check in CreateTableSpace() to account for
      fork-name component of maximum-length pathnames.  This requires putting
      FORKNAMECHARS into a header file, but it was rather useless (and
      actually unreferenced) where it was.
      
      The last couple of items are potentially back-patchable bug fixes,
      if anyone is sufficiently excited about them; but personally I'm not.
      
      Per a gripe from Christoph Berg about how include/common wasn't
      self-contained.
      2d001904
  9. 08 1月, 2014 1 次提交
  10. 31 10月, 2013 1 次提交
  11. 02 7月, 2013 1 次提交
    • R
      Use an MVCC snapshot, rather than SnapshotNow, for catalog scans. · 568d4138
      Robert Haas 提交于
      SnapshotNow scans have the undesirable property that, in the face of
      concurrent updates, the scan can fail to see either the old or the new
      versions of the row.  In many cases, we work around this by requiring
      DDL operations to hold AccessExclusiveLock on the object being
      modified; in some cases, the existing locking is inadequate and random
      failures occur as a result.  This commit doesn't change anything
      related to locking, but will hopefully pave the way to allowing lock
      strength reductions in the future.
      
      The major issue has held us back from making this change in the past
      is that taking an MVCC snapshot is significantly more expensive than
      using a static special snapshot such as SnapshotNow.  However, testing
      of various worst-case scenarios reveals that this problem is not
      severe except under fairly extreme workloads.  To mitigate those
      problems, we avoid retaking the MVCC snapshot for each new scan;
      instead, we take a new snapshot only when invalidation messages have
      been processed.  The catcache machinery already requires that
      invalidation messages be sent before releasing the related heavyweight
      lock; else other backends might rely on locally-cached data rather
      than scanning the catalog at all.  Thus, making snapshot reuse
      dependent on the same guarantees shouldn't break anything that wasn't
      already subtly broken.
      
      Patch by me.  Review by Michael Paquier and Andres Freund.
      568d4138
  12. 30 5月, 2013 1 次提交
  13. 22 2月, 2013 1 次提交
    • A
      Move relpath() to libpgcommon · a7301839
      Alvaro Herrera 提交于
      This enables non-backend code, such as pg_xlogdump, to use it easily.
      The previous location, in src/backend/catalog/catalog.c, made that
      essentially impossible because that file depends on many backend-only
      facilities; so this needs to live separately.
      a7301839
  14. 02 1月, 2013 1 次提交
  15. 29 11月, 2012 1 次提交