1. 21 12月, 2011 1 次提交
    • T
      Avoid crashing when we have problems unlinking files post-commit. · d0024cd1
      Tom Lane 提交于
      smgrdounlink takes care to not throw an ERROR if it fails to unlink
      something, but that caution was rendered useless by commit
      33960006, which put an smgrexists call in
      front of it; smgrexists *does* throw error if anything looks funny, such
      as getting a permissions error from trying to open the file.  If that
      happens post-commit, you get a PANIC, and what's worse the same logic
      appears in the WAL replay code, so the database even fails to restart.
      
      Restore the intended behavior by removing the smgrexists call --- it isn't
      accomplishing anything that we can't do better by adjusting mdunlink's
      ideas of whether it ought to warn about ENOENT or not.
      
      Per report from Joseph Shraibman of unrecoverable crash after trying to
      drop a table whose FSM fork had somehow gotten chmod'd to 000 permissions.
      Backpatch to 8.4, where the bogus coding was introduced.
      d0024cd1
  2. 02 11月, 2011 1 次提交
    • S
      Split work of bgwriter between 2 processes: bgwriter and checkpointer. · 806a2aee
      Simon Riggs 提交于
      bgwriter is now a much less important process, responsible for page
      cleaning duties only. checkpointer is now responsible for checkpoints
      and so has a key role in shutdown. Later patches will correct doc
      references to the now old idea that bgwriter performs checkpoints.
      Has beneficial effect on performance at high write rates, but mainly
      refactoring to more easily allow changes for power reduction by
      simplifying previously tortuous code around required to allow page
      cleaning and checkpointing to time slice in the same process.
      
      Patch by me, Review by Dickson Guedes
      806a2aee
  3. 27 8月, 2011 1 次提交
  4. 11 6月, 2011 1 次提交
    • A
      Use "transient" files for blind writes, take 2 · fba105b1
      Alvaro Herrera 提交于
      "Blind writes" are a mechanism to push buffers down to disk when
      evicting them; since they may belong to different databases than the one
      a backend is connected to, the backend does not necessarily have a
      relation to link them to, and thus no way to blow them away.  We were
      keeping those files open indefinitely, which would cause a problem if
      the underlying table was deleted, because the operating system would not
      be able to reclaim the disk space used by those files.
      
      To fix, have bufmgr mark such files as transient to smgr; the lower
      layer is allowed to close the file descriptor when the current
      transaction ends.  We must be careful to have any other access of the
      file to remove the transient markings, to prevent unnecessary expensive
      system calls when evicting buffers belonging to our own database (which
      files we're likely to require again soon.)
      
      This commit fixes a bug in the previous one, which neglected to cleanly
      handle the LRU ring that fd.c uses to manage open files, and caused an
      unacceptable failure just before beta2 and was thus reverted.
      fba105b1
  5. 10 6月, 2011 2 次提交
    • A
      Revert "Use "transient" files for blind writes" · 9261557e
      Alvaro Herrera 提交于
      This reverts commit 54d9e8c6, which
      caused a failure on the buildfarm.  Not a good thing to have just before
      a beta release.
      9261557e
    • A
      Use "transient" files for blind writes · 54d9e8c6
      Alvaro Herrera 提交于
      "Blind writes" are a mechanism to push buffers down to disk when
      evicting them; since they may belong to different databases than the one
      a backend is connected to, the backend does not necessarily have a
      relation to link them to, and thus no way to blow them away.  We were
      keeping those files open indefinitely, which would cause a problem if
      the underlying table was deleted, because the operating system would not
      be able to reclaim the disk space used by those files.
      
      To fix, have bufmgr mark such files as transient to smgr; the lower
      layer is allowed to close the file descriptor when the current
      transaction ends.  We must be careful to have any other access of the
      file to remove the transient markings, to prevent unnecessary expensive
      system calls when evicting buffers belonging to our own database (which
      files we're likely to require again soon.)
      54d9e8c6
  6. 12 4月, 2011 1 次提交
  7. 10 4月, 2011 1 次提交
  8. 29 1月, 2011 1 次提交
  9. 02 1月, 2011 1 次提交
  10. 14 12月, 2010 1 次提交
  11. 16 11月, 2010 1 次提交
    • R
      Add new buffers_backend_fsync field to pg_stat_bgwriter. · 3134d886
      Robert Haas 提交于
      This new field counts the number of times that a backend which writes a
      buffer out to the OS must also fsync() it.  This happens when the
      bgwriter fsync request queue is full, and is generally detrimental to
      performance, so it's good to know when it's happening.  Along the way,
      log a new message at level DEBUG1 whenever we fail to hand off an fsync,
      so that the problem can also be seen in examination of log files
      (if the logging level is cranked up high enough).
      
      Greg Smith, with minor tweaks by me.
      3134d886
  12. 21 9月, 2010 1 次提交
  13. 14 8月, 2010 1 次提交
    • R
      Include the backend ID in the relpath of temporary relations. · debcec7d
      Robert Haas 提交于
      This allows us to reliably remove all leftover temporary relation
      files on cluster startup without reference to system catalogs or WAL;
      therefore, we no longer include temporary relations in XLOG_XACT_COMMIT
      and XLOG_XACT_ABORT WAL records.
      
      Since these changes require including a backend ID in each
      SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id
      field has been reduced from two bytes to one, and the maximum number
      of connections has been reduced from INT_MAX / 4 to 2^23-1.  It would
      be possible to remove these restrictions by increasing the size of
      SharedInvalidationMessage by 4 bytes, but right now that doesn't seem
      like a good trade-off.
      
      Review by Jaime Casanova and Tom Lane.
      debcec7d
  14. 26 2月, 2010 1 次提交
  15. 03 1月, 2010 1 次提交
  16. 06 8月, 2009 1 次提交
    • H
      Improve error messages in md.c. When a filesystem operation like open() or · 23dc89d2
      Heikki Linnakangas 提交于
      fsync() fails, say "file" rather than "relation" when printing the filename.
      
      This makes messages that display block numbers a bit confusing. For example,
      in message 'could not read block 150000 of file "base/1234/5678.1"', 150000
      is the block number from the beginning of the relation, ie. segment 0, not
      150000th block within that segment. Per discussion, users aren't usually
      interested in the exact location within the file, so we can live with that.
      
      To ease constructing error messages, add FilePathName(File) function to
      return the pathname of a virtual fd.
      23dc89d2
  17. 27 6月, 2009 1 次提交
    • T
      Cleanup and code review for the patch that made bgwriter active during · 2de48a83
      Tom Lane 提交于
      archive recovery.  Invent a separate state variable and inquiry function
      for XLogInsertAllowed() to clarify some tests and make the management of
      writing the end-of-recovery checkpoint less klugy.  Fix several places
      that were incorrectly testing InRecovery when they should be looking at
      RecoveryInProgress or XLogInsertAllowed (because they will now be executed
      in the bgwriter not startup process).  Clarify handling of bad LSNs passed
      to XLogFlush during recovery.  Use a spinlock for setting/testing
      SharedRecoveryInProgress.  Improve quite a lot of comments.
      
      Heikki and Tom
      2de48a83
  18. 26 6月, 2009 1 次提交
    • H
      Fix some serious bugs in archive recovery, now that bgwriter is active · 7e48b77b
      Heikki Linnakangas 提交于
      during it:
      
      When bgwriter is active, the startup process can't perform mdsync() correctly
      because it won't see the fsync requests accumulated in bgwriter's private
      pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery
      checkpoint as well, when it's active.
      
      When bgwriter is active (= archive recovery), the startup process must not
      accumulate fsync requests to its own pendingOpsTable, since bgwriter won't
      see them there when it performs restartpoints. Make startup process drop its
      pendingOpsTable when bgwriter is launched to avoid that.
      
      Update minimum recovery point one last time when leaving archive recovery.
      It won't be updated by the end-of-recovery checkpoint because XLogFlush()
      sees us as out of recovery already.
      
      This fixes bug #4879 reported by Fujii Masao.
      7e48b77b
  19. 11 6月, 2009 1 次提交
  20. 12 3月, 2009 1 次提交
    • T
      Code review for dtrace probes added (so far) to 8.4. Adjust placement of · e04810e8
      Tom Lane 提交于
      some bufmgr probes, take out redundant and memory-leak-inducing path arguments
      to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to
      recalculate space used in sort__done, clean up formatting in places where
      I'm not sure pgindent will do a nice job by itself.
      e04810e8
  21. 12 1月, 2009 1 次提交
  22. 02 1月, 2009 1 次提交
  23. 17 12月, 2008 1 次提交
    • B
      The attached patch contains a couple of fixes in the existing probes and · 5a90bc1f
      Bruce Momjian 提交于
      includes a few new ones.
      
      - Fixed compilation errors on OS X for probes that use typedefs
      - Fixed a number of probes to pass ForkNumber per the relation forks
      patch
      - The new probes are those that were taken out from the previous
      submitted patch and required simple fixes. Will submit the other probes
      that may require more discussion in a separate patch.
      
      Robert Lor
      5a90bc1f
  24. 14 11月, 2008 1 次提交
  25. 11 11月, 2008 1 次提交
  26. 11 8月, 2008 1 次提交
    • H
      Introduce the concept of relation forks. An smgr relation can now consist · 3f0e808c
      Heikki Linnakangas 提交于
      of multiple forks, and each fork can be created and grown separately.
      
      The bulk of this patch is about changing the smgr API to include an extra
      ForkNumber argument in every smgr function. Also, smgrscheduleunlink and
      smgrdounlink no longer implicitly call smgrclose, because other forks might
      still exist after unlinking one. The callers of those functions have been
      modified to call smgrclose instead.
      
      This patch in itself doesn't have any user-visible effect, but provides the
      infrastructure needed for upcoming patches. The additional forks envisioned
      are a rewritten FSM implementation that doesn't rely on a fixed-size shared
      memory block, and a visibility map to allow skipping portions of a table in
      VACUUM that have no dead tuples.
      3f0e808c
  27. 02 5月, 2008 1 次提交
    • T
      Remove the recently added USE_SEGMENTED_FILES option, and indeed remove all · 3c6248a8
      Tom Lane 提交于
      support for a nonsegmented mode from md.c.  Per recent discussions, there
      doesn't seem to be much value in a "never segment" option as opposed to
      segmenting with a suitably large segment size.  So instead provide a
      configure-time switch to set the desired segment size in units of gigabytes.
      While at it, expose a configure switch for BLCKSZ as well.
      
      Zdenek Kotala
      3c6248a8
  28. 18 4月, 2008 1 次提交
    • H
      Fix two race conditions between the pending unlink mechanism that was put in · 9cb91f90
      Heikki Linnakangas 提交于
      place to prevent reusing relation OIDs before next checkpoint, and DROP
      DATABASE. First, if a database was dropped, bgwriter would still try to unlink
      the files that the rmtree() call by the DROP DATABASE command has already
      deleted, or is just about to delete. Second, if a database is dropped, and
      another database is created with the same OID, bgwriter would in the worst
      case delete a relation in the new database that happened to get the same OID
      as a dropped relation in the old database.
      
      To fix these race conditions:
      - make rmtree() ignore ENOENT errors. This fixes the 1st race condition.
      - make ForgetDatabaseFsyncRequests forget unlink requests as well.
      - force checkpoint on in dropdb on all platforms
      
      Since ForgetDatabaseFsyncRequests() is asynchronous, the 2nd change isn't
      enough on its own to fix the problem of dropping and creating a database with
      same OID, but forcing a checkpoint on DROP DATABASE makes it sufficient.
      
      Per Tom Lane's bug report and proposal. Backpatch to 8.3.
      9cb91f90
  29. 11 3月, 2008 1 次提交
  30. 02 1月, 2008 1 次提交
  31. 16 11月, 2007 5 次提交
  32. 03 7月, 2007 1 次提交
    • T
      Fix incorrect comment about the timing of AbsorbFsyncRequests() during · 83aaebba
      Tom Lane 提交于
      checkpoint.  The comment claimed that we could do this anytime after
      setting the checkpoint REDO point, but actually BufferSync is relying
      on the assumption that buffers dumped by other backends will be fsync'd
      too.  So we really could not do it any sooner than we are doing it.
      83aaebba
  33. 13 4月, 2007 1 次提交
    • T
      Rearrange mdsync() looping logic to avoid the problem that a sufficiently · 995ba280
      Tom Lane 提交于
      fast flow of new fsync requests can prevent mdsync() from ever completing.
      This was an unforeseen consequence of a patch added in Mar 2006 to prevent
      the fsync request queue from overflowing.  Problem identified by Heikki
      Linnakangas and independently by ITAGAKI Takahiro; fix based on ideas from
      Takahiro-san, Heikki, and Tom.
      
      Back-patch as far as 8.1 because a previous back-patch introduced the problem
      into 8.1 ...
      995ba280
  34. 18 1月, 2007 1 次提交
  35. 17 1月, 2007 1 次提交
    • T
      Revise bgwriter fsync-request mechanism to improve robustness when a table · 6d660587
      Tom Lane 提交于
      is deleted.  A backend about to unlink a file now sends a "revoke fsync"
      request to the bgwriter to make it clean out pending fsync requests.  There
      is still a race condition where the bgwriter may try to fsync after the unlink
      has happened, but we can resolve that by rechecking the fsync request queue
      to see if a revoke request arrived meanwhile.  This eliminates the former
      kluge of "just assuming" that an ENOENT failure is okay, and lets us handle
      the fact that on Windows it might be EACCES too without introducing any
      questionable assumptions.  After an idea of mine improved by Magnus.
      
      The HEAD patch doesn't apply cleanly to 8.2, but I'll see about a back-port
      later.  In the meantime this could do with some testing on Windows; I've been
      able to force it through the code path via ENOENT, but that doesn't prove that
      it actually fixes the Windows problem ...
      6d660587