1. 18 2月, 2011 1 次提交
    • I
      Add transaction-level advisory locks. · 62c7bd31
      Itagaki Takahiro 提交于
      They share the same locking namespace with the existing session-level
      advisory locks, but they are automatically released at the end of the
      current transaction and cannot be released explicitly via unlock
      functions.
      
      Marko Tiikkaja, reviewed by me.
      62c7bd31
  2. 02 1月, 2011 1 次提交
  3. 21 9月, 2010 1 次提交
  4. 24 8月, 2010 1 次提交
    • T
      Marginal code cleanup for streaming replication. · b9defe04
      Tom Lane 提交于
      There is no reason that proc.c should have to get involved in this dirty hack
      for letting the postmaster know which children are walsenders.  Revert that
      file to the way it was, and confine the kluge to pmsignal.c and postmaster.c.
      b9defe04
  5. 07 7月, 2010 1 次提交
  6. 04 7月, 2010 1 次提交
    • T
      Replace max_standby_delay with two parameters, max_standby_archive_delay and · e76c1a0f
      Tom Lane 提交于
      max_standby_streaming_delay, and revise the implementation to avoid assuming
      that timestamps found in WAL records can meaningfully be compared to clock
      time on the standby server.  Instead, the delay limits are compared to the
      elapsed time since we last obtained a new WAL segment from archive or since
      we were last "caught up" to WAL data arriving via streaming replication.
      This avoids problems with clock skew between primary and standby, as well
      as other corner cases that the original coding would misbehave in, such
      as the primary server having significant idle time between transactions.
      Per my complaint some time ago and considerable ensuing discussion.
      
      Do some desultory editing on the hot standby documentation, too.
      e76c1a0f
  7. 27 5月, 2010 1 次提交
    • S
      HS Defer buffer pin deadlock check until deadlock_timeout has expired. · f9dbac94
      Simon Riggs 提交于
      During Hot Standby we need to check for buffer pin deadlocks when the
      Startup process begins to wait, in case it never wakes up again. We
      previously made the deadlock check immediately on the basis it was
      cheap, though clearer thinking and prima facie evidence shows that
      was too simple. Refactor existing code to make it easy to add in
      deferral of deadlock check until deadlock_timeout allowing a good
      reduction in deadlock checks since far few buffer pins are held for
      that duration. It's worth doing anyway, though major goal is to
      prevent further reports of context switching with high numbers of
      users on occasional tests.
      f9dbac94
  8. 29 4月, 2010 1 次提交
    • T
      Modify ShmemInitStruct and ShmemInitHash to throw errors internally, · 77acab75
      Tom Lane 提交于
      rather than returning NULL for some-but-not-all failures as they used to.
      Remove now-redundant tests for NULL from call sites.
      
      We had to do something about this because many call sites were failing to
      check for NULL; and changing it like this seems a lot more useful and
      mistake-proof than adding checks to the call sites without them.
      77acab75
  9. 26 2月, 2010 1 次提交
  10. 13 2月, 2010 1 次提交
    • S
      Re-enable max_standby_delay = -1 using deadlock detection on startup · b95a720a
      Simon Riggs 提交于
      process. If startup waits on a buffer pin we send a request to all
      backends to cancel themselves if they are holding the buffer pin
      required and they are also waiting on a lock. If not, startup waits
      until max_standby_delay before cancelling any backend waiting for
      the requested buffer pin.
      b95a720a
  11. 08 2月, 2010 1 次提交
    • T
      Remove old-style VACUUM FULL (which was known for a little while as · 0a469c87
      Tom Lane 提交于
      VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
      Per discussion, the use case for this method of vacuuming is no longer large
      enough to justify maintaining it; not to mention that we don't wish to invest
      the work that would be needed to make it play nicely with Hot Standby.
      
      Aside from the code directly related to old-style VACUUM FULL, this commit
      removes support for certain WAL record types that could only be generated
      within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
      nontransactional generation of cache invalidation sinval messages (the last
      being the sticking point for Hot Standby).
      
      We still have to retain all code that copes with finding HEAP_MOVED_OFF and
      HEAP_MOVED_IN flag bits on existing tuples.  This can't be removed as long
      as we want to support in-place update from pre-9.0 databases.
      0a469c87
  12. 24 1月, 2010 1 次提交
    • S
      In HS, Startup process sets SIGALRM when waiting for buffer pin. If · 959ac58c
      Simon Riggs 提交于
      woken by alarm we send SIGUSR1 to all backends requesting that they
      check to see if they are blocking Startup process. If so, they throw
      ERROR/FATAL as for other conflict resolutions. Deadlock stop gap
      removed. max_standby_delay = -1 option removed to prevent deadlock.
      959ac58c
  13. 16 1月, 2010 1 次提交
    • S
      Teach standby conflict resolution to use SIGUSR1 · a8ce974c
      Simon Riggs 提交于
      Conflict reason is passed through directly to the backend, so we can
      take decisions about the effect of the conflict based upon the local
      state. No specific changes, as yet, though this prepares for later work.
      CancelVirtualTransaction() sends signals while holding ProcArrayLock.
      Introduce errdetail_abort() to give message detail explaining that the
      abort was caused by conflict processing. Remove CONFLICT_MODE states
      in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.
      a8ce974c
  14. 15 1月, 2010 1 次提交
    • H
      Introduce Streaming Replication. · 40f908bd
      Heikki Linnakangas 提交于
      This includes two new kinds of postmaster processes, walsenders and
      walreceiver. Walreceiver is responsible for connecting to the primary server
      and streaming WAL to disk, while walsender runs in the primary server and
      streams WAL from disk to the client.
      
      Documentation still needs work, but the basics are there. We will probably
      pull the replication section to a new chapter later on, as well as the
      sections describing file-based replication. But let's do that as a separate
      patch, so that it's easier to see what has been added/changed. This patch
      also adds a new section to the chapter about FE/BE protocol, documenting the
      protocol used by walsender/walreceivxer.
      
      Bump catalog version because of two new functions,
      pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
      monitoring the progress of replication.
      
      Fujii Masao, with additional hacking by me
      40f908bd
  15. 03 1月, 2010 1 次提交
  16. 19 12月, 2009 1 次提交
    • S
      Allow read only connections during recovery, known as Hot Standby. · efc16ea5
      Simon Riggs 提交于
      Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
      
      New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
      
      This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
      
      Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
      
      Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
      efc16ea5
  17. 01 9月, 2009 1 次提交
    • T
      Change the autovacuum launcher to read pg_database directly, rather than · 00e6a16d
      Tom Lane 提交于
      via the "flat files" facility.  This requires making it enough like a backend
      to be able to run transactions; it's no longer an "auxiliary process" but
      more like the autovacuum worker processes.  Also, its signal handling has
      to be brought into line with backends/workers.  In particular, since it
      now has to handle procsignal.c processing, the special autovac-launcher-only
      signal conditions are moved to SIGUSR2.
      
      Alvaro, with some cleanup from Tom
      00e6a16d
  18. 13 8月, 2009 1 次提交
    • T
      Allow backends to start up without use of the flat-file copy of pg_database. · 04011cc9
      Tom Lane 提交于
      To make this work in the base case, pg_database now has a nailed-in-cache
      relation descriptor that is initialized using hardwired knowledge in
      relcache.c.  This means pg_database is added to the set of relations that
      need to have a Schema_pg_xxx macro maintained in pg_attribute.h.  When this
      path is taken, we'll have to do a seqscan of pg_database to find the row
      we need.
      
      In the normal case, we are able to do an indexscan to find the database's row
      by name.  This is made possible by storing a global relcache init file that
      describes only the shared catalogs and their indexes (and therefore is usable
      by all backends in any database).  A new backend loads this cache file,
      finds its database OID after an indexscan on pg_database, and then loads
      the local relcache init file for that database.
      
      This change should effectively eliminate number of databases as a factor
      in backend startup time, even with large numbers of databases.  However,
      the real reason for doing it is as a first step towards getting rid of
      the flat files altogether.  There are still several other sub-projects
      to be tackled before that can happen.
      04011cc9
  19. 11 6月, 2009 1 次提交
  20. 06 5月, 2009 1 次提交
    • T
      Install a "dead man switch" to allow the postmaster to detect cases where · 969d7cd4
      Tom Lane 提交于
      a backend has done exit(0) or exit(1) without having disengaged itself
      from shared memory.  We are at risk for this whenever third-party code is
      loaded into a backend, since such code might not know it's supposed to go
      through proc_exit() instead.  Also, it is reported that under Windows
      there are ways to externally kill a process that cause the status code
      returned to the postmaster to be indistinguishable from a voluntary exit
      (thank you, Microsoft).  If this does happen then the system is probably
      hosed --- for instance, the dead session might still be holding locks.
      So the best recovery method is to treat this like a backend crash.
      
      The dead man switch is armed for a particular child process when it
      acquires a regular PGPROC, and disarmed when the PGPROC is released;
      these should be the first and last touches of shared memory resources
      in a backend, or close enough anyway.  This choice means there is no
      coverage for auxiliary processes, but I doubt we need that, since they
      shouldn't be executing any user-provided code anyway.
      
      This patch also improves the management of the EXEC_BACKEND
      ShmemBackendArray array a bit, by reducing search costs.
      
      Although this problem is of long standing, the lack of field complaints
      seems to mean it's not critical enough to risk back-patching; at least
      not till we get some more testing of this mechanism.
      969d7cd4
  21. 02 1月, 2009 1 次提交
  22. 09 12月, 2008 2 次提交
  23. 03 11月, 2008 1 次提交
    • T
      Remove the last vestiges of the MAKE_PTR/MAKE_OFFSET mechanism. We haven't · d7112cfa
      Tom Lane 提交于
      allowed different processes to have different addresses for the shmem segment
      in quite a long time, but there were still a few places left that used the
      old coding convention.  Clean them up to reduce confusion and improve the
      compiler's ability to detect pointer type mismatches.
      
      Kris Jurka
      d7112cfa
  24. 10 6月, 2008 1 次提交
  25. 09 6月, 2008 1 次提交
  26. 27 1月, 2008 1 次提交
    • T
      Change StatementCancelHandler() to check the DoingCommandRead flag to decide · 6322e844
      Tom Lane 提交于
      whether to execute an immediate interrupt, rather than testing whether
      LockWaitCancel() cancelled a lock wait.  The old way misclassified the case
      where we were blocked in ProcWaitForSignal(), and arguably would misclassify
      any other future additions of new ImmediateInterruptOK states too.  This
      allows reverting the old kluge that gave LockWaitCancel() a return value,
      since no callers care anymore.  Improve comments in the various
      implementations of PGSemaphoreLock() to explain that on some platforms, the
      assumption that semop() exits after a signal is wrong, and so we must ensure
      that the signal handler itself throws elog if we want cancel or die interrupts
      to be effective.  Per testing related to bug #3883, though this patch doesn't
      solve those problems fully.
      
      Perhaps this change should be back-patched, but since pre-8.3 branches aren't
      really relying on autovacuum to respond to SIGINT, it doesn't seem critical
      for them.
      6322e844
  27. 02 1月, 2008 1 次提交
  28. 16 11月, 2007 1 次提交
  29. 27 10月, 2007 1 次提交
  30. 25 10月, 2007 1 次提交
  31. 09 9月, 2007 1 次提交
    • T
      Replace the former method of determining snapshot xmax --- to wit, calling · 6bd4f401
      Tom Lane 提交于
      ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid"
      variable that is updated during transaction commit or abort.  Since
      latestCompletedXid is written only in places that had to lock ProcArrayLock
      exclusively anyway, and is read only in places that had to lock ProcArrayLock
      shared anyway, it adds no new locking requirements to the system despite being
      cluster-wide.  Moreover, removing ReadNewTransactionId from snapshot
      acquisition eliminates the need to take both XidGenLock and ProcArrayLock at
      the same time.  Since XidGenLock is sometimes held across I/O this can be a
      significant win.  Some preliminary benchmarking suggested that this patch has
      no effect on average throughput but can significantly improve the worst-case
      transaction times seen in pgbench.  Concept by Florian Pflug, implementation
      by Tom Lane.
      6bd4f401
  32. 06 9月, 2007 1 次提交
    • T
      Implement lazy XID allocation: transactions that do not modify any database · 295e6398
      Tom Lane 提交于
      rows will normally never obtain an XID at all.  We already did things this way
      for subtransactions, but this patch extends the concept to top-level
      transactions.  In applications where there are lots of short read-only
      transactions, this should improve performance noticeably; not so much from
      removal of the actual XID-assignments, as from reduction of overhead that's
      driven by the rate of XID consumption.  We add a concept of a "virtual
      transaction ID" so that active transactions can be uniquely identified even
      if they don't have a regular XID.  This is a much lighter-weight concept:
      uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
      record is made about them.
      
      Florian Pflug, with some editorialization by Tom.
      295e6398
  33. 28 8月, 2007 1 次提交
    • T
      Improve behavior of log_lock_waits patch. Ensure that something gets logged · 24d4517b
      Tom Lane 提交于
      even if the "deadlock detected" ERROR message is suppressed by an exception
      catcher.  Be clearer about the event sequence when a soft deadlock is fixed:
      the fixing process might or might not still have to wait, so log that
      separately.  Fix race condition when someone releases us from the lock partway
      through printing all this junk --- we'd not get confused about our state, but
      the log message sequence could have been misleading, ie, a "still waiting"
      message with no subsequent "acquired" message.  Greg Stark and Tom Lane.
      24d4517b
  34. 17 7月, 2007 1 次提交
  35. 20 6月, 2007 2 次提交
    • T
      Only log 'process acquired lock' if we actually did get the lock. This · 9cce91db
      Tom Lane 提交于
      test seems inessential right now since the only control path for not
      getting the lock is via CHECK_FOR_INTERRUPTS which won't return control
      to ProcSleep, but it would be important if we ever allow the deadlock
      code to kill someone else's transaction instead of our own.
      9cce91db
    • T
      Code review for log_lock_waits patch. Don't try to issue log messages from · 6e072287
      Tom Lane 提交于
      within a signal handler (this might be safe given the relatively narrow code
      range in which the interrupt is enabled, but it seems awfully risky); do issue
      more informative log messages that tell what is being waited for and the exact
      length of the wait; minor other code cleanup.  Greg Stark and Tom Lane
      6e072287
  36. 17 4月, 2007 1 次提交
    • A
      Add a multi-worker capability to autovacuum. This allows multiple worker · e2a186b0
      Alvaro Herrera 提交于
      processes to be running simultaneously.  Also, now autovacuum processes do not
      count towards the max_connections limit; they are counted separately from
      regular processes, and are limited by the new GUC variable
      autovacuum_max_workers.
      
      The launcher now has intelligence to launch workers on each database every
      autovacuum_naptime seconds, limited only on the max amount of worker slots
      available.
      
      Also, the global worker I/O utilization is limited by the vacuum cost-based
      delay feature.  Workers are "balanced" so that the total I/O consumption does
      not exceed the established limit.  This part of the patch was contributed by
      ITAGAKI Takahiro.
      
      Per discussion.
      e2a186b0
  37. 04 4月, 2007 1 次提交
    • T
      Remove the CheckpointStartLock in favor of having backends show whether they · 9c9b6194
      Tom Lane 提交于
      are in their commit critical sections via flags in the ProcArray.  Checkpoint
      can watch the ProcArray to determine when it's safe to proceed.  This is
      a considerably better solution to the original problem of race conditions
      between checkpoint and transaction commit: it speeds up commit, since there's
      one less lock to fool with, and it prevents the problem of checkpoint being
      delayed indefinitely when there's a constant flow of commits.  Heikki, with
      some kibitzing from Tom.
      9c9b6194
  38. 07 3月, 2007 1 次提交