1. 25 6月, 2012 1 次提交
    • H
      Replace XLogRecPtr struct with a 64-bit integer. · 0ab9d1c4
      Heikki Linnakangas 提交于
      This simplifies code that needs to do arithmetic on XLogRecPtrs.
      
      To avoid changing on-disk format of data pages, the LSN on data pages is
      still stored in the old format. That should keep pg_upgrade happy. However,
      we have XLogRecPtrs embedded in the control file, and in the structs that
      are sent over the replication protocol, so this changes breaks compatibility
      of pg_basebackup and server. I didn't do anything about this in this patch,
      per discussion on -hackers, the right thing to do would to be to change the
      replication protocol to be architecture-independent, so that you could use
      a newer version of pg_receivexlog, for example, against an older server
      version.
      0ab9d1c4
  2. 11 6月, 2012 1 次提交
  3. 14 5月, 2012 1 次提交
    • H
      Update comments that became out-of-date with the PGXACT struct. · 9e4637bf
      Heikki Linnakangas 提交于
      When the "hot" members of PGPROC were split off to separate PGXACT structs,
      many PGPROC fields referred to in comments were moved to PGXACT, but the
      comments were neglected in the commit. Mostly this is just a search/replace
      of PGPROC with PGXACT, but the way the dummy PGPROC entries are created for
      prepared transactions changed more, making some of the comments totally
      bogus.
      
      Noah Misch
      9e4637bf
  4. 10 5月, 2012 1 次提交
    • T
      Improve control logic for bgwriter hibernation mode. · 6308ba05
      Tom Lane 提交于
      Commit 6d90eaaa added a hibernation mode
      to the bgwriter to reduce the server's idle-power consumption.  However,
      its interaction with the detailed behavior of BgBufferSync's feedback
      control loop wasn't very well thought out.  That control loop depends
      primarily on the rate of buffer allocation, not the rate of buffer
      dirtying, so the hibernation mode has to be designed to operate only when
      no new buffer allocations are happening.  Also, the check for whether the
      system is effectively idle was not quite right and would fail to detect
      a constant low level of activity, thus allowing the bgwriter to go into
      hibernation mode in a way that would let the cycle time vary quite a bit,
      possibly further confusing the feedback loop.  To fix, move the wakeup
      support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into
      StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode
      unless no buffer allocations have happened recently.
      
      In addition, fix the delaying logic to remove the problem of possibly not
      responding to signals promptly, which was basically caused by trying to use
      the process latch's is_set flag for multiple purposes.  I can't prove it
      but I'm suspicious that that hack was responsible for the intermittent
      "postmaster does not shut down" failures we've been seeing in the buildfarm
      lately.  In any case it did nothing to improve the readability or
      robustness of the code.
      
      In passing, express the hibernation sleep time as a multiplier on
      BgWriterDelay, not a constant.  I'm not sure whether there's any value in
      exposing the longer sleep time as an independently configurable setting,
      but we can at least make it act like this for little extra code.
      6308ba05
  5. 09 5月, 2012 1 次提交
    • T
      Reduce idle power consumption of walwriter and checkpointer processes. · 5461564a
      Tom Lane 提交于
      This patch modifies the walwriter process so that, when it has not found
      anything useful to do for many consecutive wakeup cycles, it extends its
      sleep time to reduce the server's idle power consumption.  It reverts to
      normal as soon as it's done any successful flushes.  It's still true that
      during any async commit, backends check for completed, unflushed pages of
      WAL and signal the walwriter if there are any; so that in practice the
      walwriter can get awakened and returned to normal operation sooner than the
      sleep time might suggest.
      
      Also, improve the checkpointer so that it uses a latch and a computed delay
      time to not wake up at all except when it has something to do, replacing a
      previous hardcoded 0.5 sec wakeup cycle.  This also is primarily useful for
      reducing the server's power consumption when idle.
      
      In passing, get rid of the dedicated latch for signaling the walwriter in
      favor of using its procLatch, since that comports better with possible
      generic signal handlers using that latch.  Also, fix a pre-existing bug
      with failure to save/restore errno in walwriter's signal handlers.
      
      Peter Geoghegan, somewhat simplified by Tom
      5461564a
  6. 05 5月, 2012 1 次提交
    • T
      Overdue code review for transaction-level advisory locks patch. · 71b9549d
      Tom Lane 提交于
      Commit 62c7bd31 had assorted problems, most
      visibly that it broke PREPARE TRANSACTION in the presence of session-level
      advisory locks (which should be ignored by PREPARE), as per a recent
      complaint from Stephen Rees.  More abstractly, the patch made the
      LockMethodData.transactional flag not merely useless but outright
      dangerous, because in point of fact that flag no longer tells you anything
      at all about whether a lock is held transactionally.  This fix therefore
      removes that flag altogether.  We now rely entirely on the convention
      already in use in lock.c that transactional lock holds must be owned by
      some ResourceOwner, while session holds are never so owned.  Setting the
      locallock struct's owner link to NULL thus denotes a session hold, and
      there is no redundant marker for that.
      
      PREPARE TRANSACTION now works again when there are session-level advisory
      locks, and it is also able to transfer transactional advisory locks to the
      prepared transaction, but for implementation reasons it throws an error if
      we hold both types of lock on a single lockable object.  Perhaps it will be
      worth improving that someday.
      
      Assorted other minor cleanup and documentation editing, as well.
      
      Back-patch to 9.1, except that in the 9.1 branch I did not remove the
      LockMethodData.transactional flag for fear of causing an ABI break for
      any external code that might be examining those structs.
      71b9549d
  7. 18 4月, 2012 1 次提交
    • R
      Tighten up error recovery for fast-path locking. · 53c5b869
      Robert Haas 提交于
      The previous code could cause a backend crash after BEGIN; SAVEPOINT a;
      LOCK TABLE foo (interrupted by ^C or statement timeout); ROLLBACK TO
      SAVEPOINT a; LOCK TABLE foo, and might have leaked strong-lock counts
      in other situations.
      
      Report by Zoltán Böszörményi; patch review by Jeff Davis.
      53c5b869
  8. 22 3月, 2012 1 次提交
  9. 30 1月, 2012 1 次提交
    • H
      Make group commit more effective. · 9b38d46d
      Heikki Linnakangas 提交于
      When a backend needs to flush the WAL, and someone else is already flushing
      the WAL, wait until it releases the WALInsertLock and check if we still need
      to do the flush or if the other backend already did the work for us, before
      acquiring WALInsertLock. This helps group commit, because when the WAL flush
      finishes, all the backends that were waiting for it can be woken up in one
      go, and the can all concurrently observe that they're done, rather than
      waking them up one by one in a cascading fashion.
      
      This is based on a new LWLock function, LWLockWaitUntilFree(), which has
      peculiar semantics. If the lock is immediately free, it grabs the lock and
      returns true. If it's not free, it waits until it is released, but then
      returns false without grabbing the lock. This is used in XLogFlush(), so
      that when the lock is acquired, the backend flushes the WAL, but if it's
      not, the backend first checks the current flush location before retrying.
      
      Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
      this patch as committed ended up being very different from that.
      9b38d46d
  10. 28 1月, 2012 1 次提交
  11. 02 1月, 2012 1 次提交
  12. 25 11月, 2011 1 次提交
    • R
      Move "hot" members of PGPROC into a separate PGXACT array. · ed0b409d
      Robert Haas 提交于
      This speeds up snapshot-taking and reduces ProcArrayLock contention.
      Also, the PGPROC (and PGXACT) structures used by two-phase commit are
      now allocated as part of the main array, rather than in a separate
      array, and we keep ProcArray sorted in pointer order.  These changes
      are intended to minimize the number of cache lines that must be pulled
      in to take a snapshot, and testing shows a substantial increase in
      performance on both read and write workloads at high concurrencies.
      
      Pavan Deolasee, Heikki Linnakangas, Robert Haas
      ed0b409d
  13. 02 11月, 2011 1 次提交
  14. 10 9月, 2011 1 次提交
    • T
      Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h. · a7801b62
      Tom Lane 提交于
      As per my recent proposal, this refactors things so that these typedefs and
      macros are available in a header that can be included in frontend-ish code.
      I also changed various headers that were undesirably including
      utils/timestamp.h to include datatype/timestamp.h instead.  Unsurprisingly,
      this showed that half the system was getting utils/timestamp.h by way of
      xlog.h.
      
      No actual code changes here, just header refactoring.
      a7801b62
  15. 11 8月, 2011 1 次提交
    • T
      Change the autovacuum launcher to use WaitLatch instead of a poll loop. · 4dab3d5a
      Tom Lane 提交于
      In pursuit of this (and with the expectation that WaitLatch will be needed
      in more places), convert the latch field that was already added to PGPROC
      for sync rep into a generic latch that is activated for all PGPROC-owning
      processes, and change many of the standard backend signal handlers to set
      that latch when a signal happens.  This will allow WaitLatch callers to be
      wakened properly by these signals.
      
      In passing, fix a whole bunch of signal handlers that had been hacked to do
      things that might change errno, without adding the necessary save/restore
      logic for errno.  Also make some minor fixes in unix_latch.c, and clean
      up bizarre and unsafe scheme for disowning the process's latch.  Much of
      this has to be back-patched into 9.1.
      
      Peter Geoghegan, with additional work by Tom
      4dab3d5a
  16. 10 8月, 2011 1 次提交
    • T
      Documentation improvement and minor code cleanups for the latch facility. · 4e15a4db
      Tom Lane 提交于
      Improve the documentation around weak-memory-ordering risks, and do a pass
      of general editorialization on the comments in the latch code.  Make the
      Windows latch code more like the Unix latch code where feasible; in
      particular provide the same Assert checks in both implementations.
      Fix poorly-placed WaitLatch call in syncrep.c.
      
      This patch resolves, for the moment, concerns around weak-memory-ordering
      bugs in latch-related code: we have documented the restrictions and checked
      that existing calls meet them.  In 9.2 I hope that we will install suitable
      memory barrier instructions in SetLatch/ResetLatch, so that their callers
      don't need to be quite so careful.
      4e15a4db
  17. 03 8月, 2011 2 次提交
    • T
      Move CheckRecoveryConflictDeadlock() call to a safer place. · ac36e6f7
      Tom Lane 提交于
      This kluge was inserted in a spot apparently chosen at random: the lock
      manager's state is not yet fully set up for the wait, and in particular
      LockWaitCancel hasn't been armed by setting lockAwaited, so the ProcLock
      will not get cleaned up if the ereport is thrown.  This seems to not cause
      any observable problem in trivial test cases, because LockReleaseAll will
      silently clean up the debris; but I was able to cause failures with tests
      involving subtransactions.
      
      Fixes breakage induced by commit c85c9414.
      Back-patch to all affected branches.
      ac36e6f7
    • T
      Fix incorrect initialization of ProcGlobal->startupBufferPinWaitBufId. · 2e53bd55
      Tom Lane 提交于
      It was initialized in the wrong place and to the wrong value.  With bad
      luck this could result in incorrect query-cancellation failures in hot
      standby sessions, should a HS backend be holding pin on buffer number 1
      while trying to acquire a lock.
      2e53bd55
  18. 18 7月, 2011 1 次提交
    • R
      Create a "fast path" for acquiring weak relation locks. · 3cba8999
      Robert Haas 提交于
      When an AccessShareLock, RowShareLock, or RowExclusiveLock is requested
      on an unshared database relation, and we can verify that no conflicting
      locks can possibly be present, record the lock in a per-backend queue,
      stored within the PGPROC, rather than in the primary lock table.  This
      eliminates a great deal of contention on the lock manager LWLocks.
      
      This patch also refactors the interface between GetLockStatusData() and
      pg_lock_status() to be a bit more abstract, so that we don't rely so
      heavily on the lock manager's internal representation details.  The new
      fast path lock structures don't have a LOCK or PROCLOCK structure to
      return, so we mustn't depend on that for purposes of listing outstanding
      locks.
      
      Review by Jeff Davis.
      3cba8999
  19. 29 6月, 2011 1 次提交
  20. 19 6月, 2011 1 次提交
  21. 17 6月, 2011 1 次提交
    • R
      Fix minor thinko in ProcGlobalShmemSize(). · c573486c
      Robert Haas 提交于
      There's no need to add space for startupBufferPinWaitBufId, because
      it's part of the PROC_HDR object for which this function already
      allocates space.
      
      This has been wrong for a while, but the only consequence is that our
      shared memory allocation is increased by 4 bytes, so no back-patch.
      c573486c
  22. 12 6月, 2011 1 次提交
    • R
      Code cleanup for InitProcGlobal. · 47ebcecc
      Robert Haas 提交于
      The old code creates three separate arrays when only one is needed,
      using two different shmem allocation functions for no obvious reason.
      It also strangely splits up the initialization of AuxilaryProcs
      between the top and bottom of the function to no evident purpose.
      
      Review by Tom Lane.
      47ebcecc
  23. 07 3月, 2011 1 次提交
    • S
      Efficient transaction-controlled synchronous replication. · a8a8a3e0
      Simon Riggs 提交于
      If a standby is broadcasting reply messages and we have named
      one or more standbys in synchronous_standby_names then allow
      users who set synchronous_replication to wait for commit, which
      then provides strict data integrity guarantees. Design avoids
      sending and receiving transaction state information so minimises
      bookkeeping overheads. We synchronize with the highest priority
      standby that is connected and ready to synchronize. Other standbys
      can be defined to takeover in case of standby failure.
      
      This version has very strict behaviour; more relaxed options
      may be added at a later date.
      
      Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime
      Casanova, Heikki Linnakangas and Robert Haas, plus the assistance
      of many other design reviewers.
      a8a8a3e0
  24. 18 2月, 2011 1 次提交
    • I
      Add transaction-level advisory locks. · 62c7bd31
      Itagaki Takahiro 提交于
      They share the same locking namespace with the existing session-level
      advisory locks, but they are automatically released at the end of the
      current transaction and cannot be released explicitly via unlock
      functions.
      
      Marko Tiikkaja, reviewed by me.
      62c7bd31
  25. 02 1月, 2011 1 次提交
  26. 21 9月, 2010 1 次提交
  27. 24 8月, 2010 1 次提交
    • T
      Marginal code cleanup for streaming replication. · b9defe04
      Tom Lane 提交于
      There is no reason that proc.c should have to get involved in this dirty hack
      for letting the postmaster know which children are walsenders.  Revert that
      file to the way it was, and confine the kluge to pmsignal.c and postmaster.c.
      b9defe04
  28. 07 7月, 2010 1 次提交
  29. 04 7月, 2010 1 次提交
    • T
      Replace max_standby_delay with two parameters, max_standby_archive_delay and · e76c1a0f
      Tom Lane 提交于
      max_standby_streaming_delay, and revise the implementation to avoid assuming
      that timestamps found in WAL records can meaningfully be compared to clock
      time on the standby server.  Instead, the delay limits are compared to the
      elapsed time since we last obtained a new WAL segment from archive or since
      we were last "caught up" to WAL data arriving via streaming replication.
      This avoids problems with clock skew between primary and standby, as well
      as other corner cases that the original coding would misbehave in, such
      as the primary server having significant idle time between transactions.
      Per my complaint some time ago and considerable ensuing discussion.
      
      Do some desultory editing on the hot standby documentation, too.
      e76c1a0f
  30. 27 5月, 2010 1 次提交
    • S
      HS Defer buffer pin deadlock check until deadlock_timeout has expired. · f9dbac94
      Simon Riggs 提交于
      During Hot Standby we need to check for buffer pin deadlocks when the
      Startup process begins to wait, in case it never wakes up again. We
      previously made the deadlock check immediately on the basis it was
      cheap, though clearer thinking and prima facie evidence shows that
      was too simple. Refactor existing code to make it easy to add in
      deferral of deadlock check until deadlock_timeout allowing a good
      reduction in deadlock checks since far few buffer pins are held for
      that duration. It's worth doing anyway, though major goal is to
      prevent further reports of context switching with high numbers of
      users on occasional tests.
      f9dbac94
  31. 29 4月, 2010 1 次提交
    • T
      Modify ShmemInitStruct and ShmemInitHash to throw errors internally, · 77acab75
      Tom Lane 提交于
      rather than returning NULL for some-but-not-all failures as they used to.
      Remove now-redundant tests for NULL from call sites.
      
      We had to do something about this because many call sites were failing to
      check for NULL; and changing it like this seems a lot more useful and
      mistake-proof than adding checks to the call sites without them.
      77acab75
  32. 26 2月, 2010 1 次提交
  33. 13 2月, 2010 1 次提交
    • S
      Re-enable max_standby_delay = -1 using deadlock detection on startup · b95a720a
      Simon Riggs 提交于
      process. If startup waits on a buffer pin we send a request to all
      backends to cancel themselves if they are holding the buffer pin
      required and they are also waiting on a lock. If not, startup waits
      until max_standby_delay before cancelling any backend waiting for
      the requested buffer pin.
      b95a720a
  34. 08 2月, 2010 1 次提交
    • T
      Remove old-style VACUUM FULL (which was known for a little while as · 0a469c87
      Tom Lane 提交于
      VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
      Per discussion, the use case for this method of vacuuming is no longer large
      enough to justify maintaining it; not to mention that we don't wish to invest
      the work that would be needed to make it play nicely with Hot Standby.
      
      Aside from the code directly related to old-style VACUUM FULL, this commit
      removes support for certain WAL record types that could only be generated
      within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
      nontransactional generation of cache invalidation sinval messages (the last
      being the sticking point for Hot Standby).
      
      We still have to retain all code that copes with finding HEAP_MOVED_OFF and
      HEAP_MOVED_IN flag bits on existing tuples.  This can't be removed as long
      as we want to support in-place update from pre-9.0 databases.
      0a469c87
  35. 24 1月, 2010 1 次提交
    • S
      In HS, Startup process sets SIGALRM when waiting for buffer pin. If · 959ac58c
      Simon Riggs 提交于
      woken by alarm we send SIGUSR1 to all backends requesting that they
      check to see if they are blocking Startup process. If so, they throw
      ERROR/FATAL as for other conflict resolutions. Deadlock stop gap
      removed. max_standby_delay = -1 option removed to prevent deadlock.
      959ac58c
  36. 16 1月, 2010 1 次提交
    • S
      Teach standby conflict resolution to use SIGUSR1 · a8ce974c
      Simon Riggs 提交于
      Conflict reason is passed through directly to the backend, so we can
      take decisions about the effect of the conflict based upon the local
      state. No specific changes, as yet, though this prepares for later work.
      CancelVirtualTransaction() sends signals while holding ProcArrayLock.
      Introduce errdetail_abort() to give message detail explaining that the
      abort was caused by conflict processing. Remove CONFLICT_MODE states
      in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.
      a8ce974c
  37. 15 1月, 2010 1 次提交
    • H
      Introduce Streaming Replication. · 40f908bd
      Heikki Linnakangas 提交于
      This includes two new kinds of postmaster processes, walsenders and
      walreceiver. Walreceiver is responsible for connecting to the primary server
      and streaming WAL to disk, while walsender runs in the primary server and
      streams WAL from disk to the client.
      
      Documentation still needs work, but the basics are there. We will probably
      pull the replication section to a new chapter later on, as well as the
      sections describing file-based replication. But let's do that as a separate
      patch, so that it's easier to see what has been added/changed. This patch
      also adds a new section to the chapter about FE/BE protocol, documenting the
      protocol used by walsender/walreceivxer.
      
      Bump catalog version because of two new functions,
      pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
      monitoring the progress of replication.
      
      Fujii Masao, with additional hacking by me
      40f908bd
  38. 03 1月, 2010 1 次提交
  39. 19 12月, 2009 1 次提交
    • S
      Allow read only connections during recovery, known as Hot Standby. · efc16ea5
      Simon Riggs 提交于
      Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
      
      New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
      
      This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
      
      Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
      
      Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
      efc16ea5