1. 13 1月, 2018 3 次提交
    • H
      Fix deletion of AO and AOCS tables, to remove all segments. · 175c25e8
      Heikki Linnakangas 提交于
      This hopefully fixes the gp_replica_check failures we're seeing in the
      pipeline.
      175c25e8
    • H
      Remove a lot of persistent table and mirroring stuff. · 5c158ff3
      Heikki Linnakangas 提交于
      * Revert almost all the changes in smgr.c / md.c, to not go through
        the Mirrored* APIs.
      
      * Remove mmxlog stuff. Use upstream "pending relation deletion" code
        instead.
      
      * Get rid of multiple startup passes. Now it's just a single pass like
        in the upstream.
      
      * Revert the way database drop/create are handled to the way it is in
        upstream. Doesn't use PT anymore, but accesses file system directly,
        and WAL-logs a single CREATE/DROP DATABASE WAL record.
      
      * Get rid of MirroredLock
      
      * Remove a few tests that were specific to persistent tables.
      
      * Plus a lot of little removals and reverts to upstream code.
      5c158ff3
    • A
      Remove cdbfilerepprimary.c. · 1d38b8e1
      Ashwin Agrawal 提交于
      This was little painful one to entangle, but seems done now. Though if any
      shake-up happens shoudl be primary suspect.
      1d38b8e1
  2. 21 11月, 2017 1 次提交
    • H
      Move some GPDB-specific code out of smgr.c and md.c. · 306b189d
      Heikki Linnakangas 提交于
      For clarity, and to make merging easier.
      
      The code to manage the hash table of "pending resync EOFs" for append-only
      tables is moved to smgr_ao.c. One notable change here is that the
      pendingDeletesPerformed flag is removed. It was used to track whether there
      are any pending deletes, or any pending AO table resyncs, but we might as
      well check the pending delete list and the pending syncs hash table
      directly, it's hardly any slower than checking a separate boolean.
      
      There are still plenty of GPDB changes in smgr.c, but this is a good step
      forward.
      306b189d
  3. 24 6月, 2017 1 次提交
  4. 07 3月, 2017 3 次提交
    • A
      Checkpointer and BgWriter code closer to PG 9.2. · a453e7a3
      Ashwin Agrawal 提交于
      Rename checkpoint.c to checkpointer.c. And move the code from bgwriter.c to
      checkpointer.c and also renames most of corresponding data structures to refect
      the clear ownership and association. This commit brings it as close as possible
      to PostgreSQL 9.2.
      
      Reference to PostgreSQL related commits:
      commit 806a2aee
          Split work of bgwriter between 2 processes: bgwriter and checkpointer.
      commit bf405ba8
          Add new file for checkpointer.c
      commit 8f28789b
          Rename BgWriterShmem/Request to CheckpointerShmem/Request
      commit d843589e5ab361dd4738dab5c9016e704faf4153
          Fix management of pendingOpsTable in auxiliary processes.
      a453e7a3
    • A
      Correctly maintain pendingOpsTable in checkpoint process. · 0291ff60
      Ashwin Agrawal, Asim R P and Xin Zhang 提交于
      We had partially pulled the fix to separate checkpoint and bgwriter
      processes and introduced a bug where pendingOpsTable was maintained in
      both the processes.  The pendingOpsTable records pending fsync
      requests.  Only checkpoint process should keep it.  Bgwriter should
      only write out dirty pages to OS cache.  Apparently, upstream also had
      this same bug and it was fixed in
      d843589e5ab361dd4738dab5c9016e704faf4153
      
      Also ensure that background writer sweeps buffers even in the first run after
      checkpoint.  There is no reason to hold off until next run and this is how it
      works in upstream.
      
      Fixes issue discussed on mailing list:
      https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/PHKuQPNwWs0
      0291ff60
    • A
      pg_regress test to validate fsync requests are not lost. · 85b7754d
      Ashwin Agrawal and Xin Zhang 提交于
      The commit includes a UDF to walk dirty shared buffers and a new fault
      `fault_counter` to count the number of files fsync'ed by checkpointer process.
      
      Also another new fault `bg_buffer_sync_default_logic` to flush all buffers for
      BgBufferSync() for the background writer process.
      85b7754d
  5. 03 3月, 2017 1 次提交
  6. 20 12月, 2016 1 次提交
    • H
      Add support to pg_upgrade for upgrading Greenplum clusters · 675b2991
      Heikki Linnakangas 提交于
      This commit substantially rewrites pg_upgrade to handle upgrading a
      Greenplum cluster from 4.3 to 5.0. The Greenplum specifics of pg_upgrade
      are documented in contrib/pg_upgrade/README.gpdb. A summary of the
      changes is listed below:
      
       - Make pg_upgrade to pass the pre-checks against GPDB 4.3.
       - Restore dumped schema in utility mode: pg_upgrade is executed on a
         single server in offline mode so ensure we are using utility mode.
       - Disable pg_upgrade checks that don't apply when upgrading to 8.3:
         When support for upgrading to Greenplum 6.0 is added the checks that
         make sense to backport will need to be readded.
       - Support AO/AOCS table: This bumps the AO table version number, and
         adds a conversion routine for numeric attributes. The on-disk format
         of numerics changed between PostgreSQL 8.3 and 8.4. With this commit,
         we can distinguish between AO segments created in the old format and
         the new, and read both formats. New AO segments are always created in
         the new format. Also performs a check for AO tables having NUMERIC
         attributes without free segfiles. Since AO table segments cannot be
         rewritten if there are no free segfiles, issue a warning if such a
         table is encountered during the upgrade.
       - Add code to convert heap pages offline: Bumps heap page format version
         number. While this isn't strictly necessary, when we're doing the
         conversion off-line, it reduces confusion if something goes wrong.
       - Add check for money datatype: the upgrade doesn't support the money
         datatype so check for it's presence and abort upgrade if found.
       - Create new Oid in QD and pass new Oids in dump for pg_upgrade on QE:
         When upgrading from GPDB4 to 5, we need to create new arraytypes for
         the base relation rowtypes in the QD, but we also need to dispatch
         these new OIDs to the QEs. Objects assigning InvalidOid in the Oid
         dispatcher will cause a new Oid to be assigned. Once the new cluster
         is restored, dump the new Oids into a separate dumpfile which isn't
         unlinked on exit. If this file is placed into the cwd of pg_upgrade
         on the QEs, it will be pulled into the db dump and used during
         restoring, thus "dispatching" the Oids from the QD even though they
         are offline. pg_upgrade doesn't at this point know if it's running
         at a QD or a QE so it will always dump this file and include the
         InvalidOid markers.
       - gp_relation_node is reset and rebuilt during upgrade once the data
         files from the old cluster are available to the new cluster. This
         change required altering how checkpoints are requested in the
         backend.
       - Mark indexes as invalid to ensure they are rebuilt in the new
         cluster.
       - Copy the pg_distributedlog from old to new during upgrade: We need
         the distributedlog in the new cluster to be able to start up once
         the upgrade has pulled over the clog.
       - Dont delete dumps when runnin with --debug: While not specific to
         Greenplum, this is a local addition which greatly helps testing
         and development of pg_upgrade.
      
      For testing purposes, a small test cluster created with Greenplum 4.3
      is included in contrib/pg_upgrade/test
      
      Heikki Linnakangas, Daniel Gustafsson and Dave Cramer
      675b2991
  7. 10 5月, 2016 1 次提交
  8. 26 11月, 2015 1 次提交
  9. 28 10月, 2015 1 次提交
  10. 26 6月, 2012 1 次提交
    • R
      Backport fsync queue compaction logic to all supported branches. · ef0f9dde
      Robert Haas 提交于
      This backports commit 7f242d88,
      except for the counter in pg_stat_bgwriter.  The underlying problem
      (namely, that a full fsync request queue causes terrible checkpoint
      behavior) continues to be reported in the wild, and this code seems
      to be safe and robust enough to risk back-porting the fix.
      ef0f9dde
  11. 27 6月, 2009 1 次提交
    • T
      Cleanup and code review for the patch that made bgwriter active during · 2de48a83
      Tom Lane 提交于
      archive recovery.  Invent a separate state variable and inquiry function
      for XLogInsertAllowed() to clarify some tests and make the management of
      writing the end-of-recovery checkpoint less klugy.  Fix several places
      that were incorrectly testing InRecovery when they should be looking at
      RecoveryInProgress or XLogInsertAllowed (because they will now be executed
      in the bgwriter not startup process).  Clarify handling of bad LSNs passed
      to XLogFlush during recovery.  Use a spinlock for setting/testing
      SharedRecoveryInProgress.  Improve quite a lot of comments.
      
      Heikki and Tom
      2de48a83
  12. 26 6月, 2009 1 次提交
    • H
      Fix some serious bugs in archive recovery, now that bgwriter is active · 7e48b77b
      Heikki Linnakangas 提交于
      during it:
      
      When bgwriter is active, the startup process can't perform mdsync() correctly
      because it won't see the fsync requests accumulated in bgwriter's private
      pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery
      checkpoint as well, when it's active.
      
      When bgwriter is active (= archive recovery), the startup process must not
      accumulate fsync requests to its own pendingOpsTable, since bgwriter won't
      see them there when it performs restartpoints. Make startup process drop its
      pendingOpsTable when bgwriter is launched to avoid that.
      
      Update minimum recovery point one last time when leaving archive recovery.
      It won't be updated by the end-of-recovery checkpoint because XLogFlush()
      sees us as out of recovery already.
      
      This fixes bug #4879 reported by Fujii Masao.
      7e48b77b
  13. 11 6月, 2009 1 次提交
  14. 12 3月, 2009 1 次提交
    • T
      Code review for dtrace probes added (so far) to 8.4. Adjust placement of · e04810e8
      Tom Lane 提交于
      some bufmgr probes, take out redundant and memory-leak-inducing path arguments
      to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to
      recalculate space used in sort__done, clean up formatting in places where
      I'm not sure pgindent will do a nice job by itself.
      e04810e8
  15. 12 1月, 2009 1 次提交
  16. 02 1月, 2009 1 次提交
  17. 17 12月, 2008 1 次提交
    • B
      The attached patch contains a couple of fixes in the existing probes and · 5a90bc1f
      Bruce Momjian 提交于
      includes a few new ones.
      
      - Fixed compilation errors on OS X for probes that use typedefs
      - Fixed a number of probes to pass ForkNumber per the relation forks
      patch
      - The new probes are those that were taken out from the previous
      submitted patch and required simple fixes. Will submit the other probes
      that may require more discussion in a separate patch.
      
      Robert Lor
      5a90bc1f
  18. 14 11月, 2008 1 次提交
  19. 11 11月, 2008 1 次提交
  20. 11 8月, 2008 1 次提交
    • H
      Introduce the concept of relation forks. An smgr relation can now consist · 3f0e808c
      Heikki Linnakangas 提交于
      of multiple forks, and each fork can be created and grown separately.
      
      The bulk of this patch is about changing the smgr API to include an extra
      ForkNumber argument in every smgr function. Also, smgrscheduleunlink and
      smgrdounlink no longer implicitly call smgrclose, because other forks might
      still exist after unlinking one. The callers of those functions have been
      modified to call smgrclose instead.
      
      This patch in itself doesn't have any user-visible effect, but provides the
      infrastructure needed for upcoming patches. The additional forks envisioned
      are a rewritten FSM implementation that doesn't rely on a fixed-size shared
      memory block, and a visibility map to allow skipping portions of a table in
      VACUUM that have no dead tuples.
      3f0e808c
  21. 02 5月, 2008 1 次提交
    • T
      Remove the recently added USE_SEGMENTED_FILES option, and indeed remove all · 3c6248a8
      Tom Lane 提交于
      support for a nonsegmented mode from md.c.  Per recent discussions, there
      doesn't seem to be much value in a "never segment" option as opposed to
      segmenting with a suitably large segment size.  So instead provide a
      configure-time switch to set the desired segment size in units of gigabytes.
      While at it, expose a configure switch for BLCKSZ as well.
      
      Zdenek Kotala
      3c6248a8
  22. 18 4月, 2008 2 次提交
    • H
      Fix two race conditions between the pending unlink mechanism that was put in · b8c58230
      Heikki Linnakangas 提交于
      place to prevent reusing relation OIDs before next checkpoint, and DROP
      DATABASE. First, if a database was dropped, bgwriter would still try to unlink
      the files that the rmtree() call by the DROP DATABASE command has already
      deleted, or is just about to delete. Second, if a database is dropped, and
      another database is created with the same OID, bgwriter would in the worst
      case delete a relation in the new database that happened to get the same OID
      as a dropped relation in the old database.
      
      To fix these race conditions:
      - make rmtree() ignore ENOENT errors. This fixes the 1st race condition.
      - make ForgetDatabaseFsyncRequests forget unlink requests as well.
      - force checkpoint on in dropdb on all platforms
      
      Since ForgetDatabaseFsyncRequests() is asynchronous, the 2nd change isn't
      enough on its own to fix the problem of dropping and creating a database with
      same OID, but forcing a checkpoint on DROP DATABASE makes it sufficient.
      
      Per Tom Lane's bug report and proposal. Backpatch to 8.3.
      b8c58230
    • H
      Fix two race conditions between the pending unlink mechanism that was put in · 9cb91f90
      Heikki Linnakangas 提交于
      place to prevent reusing relation OIDs before next checkpoint, and DROP
      DATABASE. First, if a database was dropped, bgwriter would still try to unlink
      the files that the rmtree() call by the DROP DATABASE command has already
      deleted, or is just about to delete. Second, if a database is dropped, and
      another database is created with the same OID, bgwriter would in the worst
      case delete a relation in the new database that happened to get the same OID
      as a dropped relation in the old database.
      
      To fix these race conditions:
      - make rmtree() ignore ENOENT errors. This fixes the 1st race condition.
      - make ForgetDatabaseFsyncRequests forget unlink requests as well.
      - force checkpoint on in dropdb on all platforms
      
      Since ForgetDatabaseFsyncRequests() is asynchronous, the 2nd change isn't
      enough on its own to fix the problem of dropping and creating a database with
      same OID, but forcing a checkpoint on DROP DATABASE makes it sufficient.
      
      Per Tom Lane's bug report and proposal. Backpatch to 8.3.
      9cb91f90
  23. 11 3月, 2008 1 次提交
  24. 02 1月, 2008 1 次提交
  25. 16 11月, 2007 5 次提交
  26. 03 7月, 2007 1 次提交
    • T
      Fix incorrect comment about the timing of AbsorbFsyncRequests() during · 83aaebba
      Tom Lane 提交于
      checkpoint.  The comment claimed that we could do this anytime after
      setting the checkpoint REDO point, but actually BufferSync is relying
      on the assumption that buffers dumped by other backends will be fsync'd
      too.  So we really could not do it any sooner than we are doing it.
      83aaebba
  27. 13 4月, 2007 1 次提交
    • T
      Rearrange mdsync() looping logic to avoid the problem that a sufficiently · 995ba280
      Tom Lane 提交于
      fast flow of new fsync requests can prevent mdsync() from ever completing.
      This was an unforeseen consequence of a patch added in Mar 2006 to prevent
      the fsync request queue from overflowing.  Problem identified by Heikki
      Linnakangas and independently by ITAGAKI Takahiro; fix based on ideas from
      Takahiro-san, Heikki, and Tom.
      
      Back-patch as far as 8.1 because a previous back-patch introduced the problem
      into 8.1 ...
      995ba280
  28. 18 1月, 2007 1 次提交
  29. 17 1月, 2007 1 次提交
    • T
      Revise bgwriter fsync-request mechanism to improve robustness when a table · 6d660587
      Tom Lane 提交于
      is deleted.  A backend about to unlink a file now sends a "revoke fsync"
      request to the bgwriter to make it clean out pending fsync requests.  There
      is still a race condition where the bgwriter may try to fsync after the unlink
      has happened, but we can resolve that by rechecking the fsync request queue
      to see if a revoke request arrived meanwhile.  This eliminates the former
      kluge of "just assuming" that an ENOENT failure is okay, and lets us handle
      the fact that on Windows it might be EACCES too without introducing any
      questionable assumptions.  After an idea of mine improved by Magnus.
      
      The HEAD patch doesn't apply cleanly to 8.2, but I'll see about a back-port
      later.  In the meantime this could do with some testing on Windows; I've been
      able to force it through the code path via ENOENT, but that doesn't prove that
      it actually fixes the Windows problem ...
      6d660587
  30. 06 1月, 2007 1 次提交
  31. 04 1月, 2007 1 次提交
    • T
      Clean up smgr.c/md.c APIs as per discussion a couple months ago. Instead of · ef072219
      Tom Lane 提交于
      having md.c return a success/failure boolean to smgr.c, which was just going
      to elog anyway, let md.c issue the elog messages itself.  This allows better
      error reporting, particularly in cases such as "short read" or "short write"
      which Peter was complaining of.  Also, remove the kluge of allowing mdread()
      to return zeroes from a read-beyond-EOF: this is now an error condition
      except when InRecovery or zero_damaged_pages = true.  (Hash indexes used to
      require that behavior, but no more.)  Also, enforce that mdwrite() is to be
      used for rewriting existing blocks while mdextend() is to be used for
      extending the relation EOF.  This restriction lets us get rid of the old
      ad-hoc defense against creating huge files by an accidental reference to
      a bogus block number: we'll only create new segments in mdextend() not
      mdwrite() or mdread().  (Again, when InRecovery we allow it anyway, since
      we need to allow updates of blocks that were later truncated away.)
      Also, clean up the original makeshift patch for bug #2737: move the
      responsibility for padding relation segments to full length into md.c.
      ef072219