1. 08 2月, 2010 1 次提交
    • T
      Remove old-style VACUUM FULL (which was known for a little while as · 0a469c87
      Tom Lane 提交于
      VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
      Per discussion, the use case for this method of vacuuming is no longer large
      enough to justify maintaining it; not to mention that we don't wish to invest
      the work that would be needed to make it play nicely with Hot Standby.
      
      Aside from the code directly related to old-style VACUUM FULL, this commit
      removes support for certain WAL record types that could only be generated
      within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
      nontransactional generation of cache invalidation sinval messages (the last
      being the sticking point for Hot Standby).
      
      We still have to retain all code that copes with finding HEAP_MOVED_OFF and
      HEAP_MOVED_IN flag bits on existing tuples.  This can't be removed as long
      as we want to support in-place update from pre-9.0 databases.
      0a469c87
  2. 24 1月, 2010 1 次提交
    • S
      In HS, Startup process sets SIGALRM when waiting for buffer pin. If · 959ac58c
      Simon Riggs 提交于
      woken by alarm we send SIGUSR1 to all backends requesting that they
      check to see if they are blocking Startup process. If so, they throw
      ERROR/FATAL as for other conflict resolutions. Deadlock stop gap
      removed. max_standby_delay = -1 option removed to prevent deadlock.
      959ac58c
  3. 16 1月, 2010 1 次提交
    • S
      Teach standby conflict resolution to use SIGUSR1 · a8ce974c
      Simon Riggs 提交于
      Conflict reason is passed through directly to the backend, so we can
      take decisions about the effect of the conflict based upon the local
      state. No specific changes, as yet, though this prepares for later work.
      CancelVirtualTransaction() sends signals while holding ProcArrayLock.
      Introduce errdetail_abort() to give message detail explaining that the
      abort was caused by conflict processing. Remove CONFLICT_MODE states
      in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.
      a8ce974c
  4. 15 1月, 2010 1 次提交
    • H
      Introduce Streaming Replication. · 40f908bd
      Heikki Linnakangas 提交于
      This includes two new kinds of postmaster processes, walsenders and
      walreceiver. Walreceiver is responsible for connecting to the primary server
      and streaming WAL to disk, while walsender runs in the primary server and
      streams WAL from disk to the client.
      
      Documentation still needs work, but the basics are there. We will probably
      pull the replication section to a new chapter later on, as well as the
      sections describing file-based replication. But let's do that as a separate
      patch, so that it's easier to see what has been added/changed. This patch
      also adds a new section to the chapter about FE/BE protocol, documenting the
      protocol used by walsender/walreceivxer.
      
      Bump catalog version because of two new functions,
      pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
      monitoring the progress of replication.
      
      Fujii Masao, with additional hacking by me
      40f908bd
  5. 03 1月, 2010 1 次提交
  6. 19 12月, 2009 1 次提交
    • S
      Allow read only connections during recovery, known as Hot Standby. · efc16ea5
      Simon Riggs 提交于
      Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
      
      New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
      
      This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
      
      Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
      
      Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
      efc16ea5
  7. 01 9月, 2009 1 次提交
    • T
      Change the autovacuum launcher to read pg_database directly, rather than · 00e6a16d
      Tom Lane 提交于
      via the "flat files" facility.  This requires making it enough like a backend
      to be able to run transactions; it's no longer an "auxiliary process" but
      more like the autovacuum worker processes.  Also, its signal handling has
      to be brought into line with backends/workers.  In particular, since it
      now has to handle procsignal.c processing, the special autovac-launcher-only
      signal conditions are moved to SIGUSR2.
      
      Alvaro, with some cleanup from Tom
      00e6a16d
  8. 13 8月, 2009 1 次提交
    • T
      Allow backends to start up without use of the flat-file copy of pg_database. · 04011cc9
      Tom Lane 提交于
      To make this work in the base case, pg_database now has a nailed-in-cache
      relation descriptor that is initialized using hardwired knowledge in
      relcache.c.  This means pg_database is added to the set of relations that
      need to have a Schema_pg_xxx macro maintained in pg_attribute.h.  When this
      path is taken, we'll have to do a seqscan of pg_database to find the row
      we need.
      
      In the normal case, we are able to do an indexscan to find the database's row
      by name.  This is made possible by storing a global relcache init file that
      describes only the shared catalogs and their indexes (and therefore is usable
      by all backends in any database).  A new backend loads this cache file,
      finds its database OID after an indexscan on pg_database, and then loads
      the local relcache init file for that database.
      
      This change should effectively eliminate number of databases as a factor
      in backend startup time, even with large numbers of databases.  However,
      the real reason for doing it is as a first step towards getting rid of
      the flat files altogether.  There are still several other sub-projects
      to be tackled before that can happen.
      04011cc9
  9. 11 6月, 2009 1 次提交
  10. 06 5月, 2009 1 次提交
    • T
      Install a "dead man switch" to allow the postmaster to detect cases where · 969d7cd4
      Tom Lane 提交于
      a backend has done exit(0) or exit(1) without having disengaged itself
      from shared memory.  We are at risk for this whenever third-party code is
      loaded into a backend, since such code might not know it's supposed to go
      through proc_exit() instead.  Also, it is reported that under Windows
      there are ways to externally kill a process that cause the status code
      returned to the postmaster to be indistinguishable from a voluntary exit
      (thank you, Microsoft).  If this does happen then the system is probably
      hosed --- for instance, the dead session might still be holding locks.
      So the best recovery method is to treat this like a backend crash.
      
      The dead man switch is armed for a particular child process when it
      acquires a regular PGPROC, and disarmed when the PGPROC is released;
      these should be the first and last touches of shared memory resources
      in a backend, or close enough anyway.  This choice means there is no
      coverage for auxiliary processes, but I doubt we need that, since they
      shouldn't be executing any user-provided code anyway.
      
      This patch also improves the management of the EXEC_BACKEND
      ShmemBackendArray array a bit, by reducing search costs.
      
      Although this problem is of long standing, the lack of field complaints
      seems to mean it's not critical enough to risk back-patching; at least
      not till we get some more testing of this mechanism.
      969d7cd4
  11. 02 1月, 2009 1 次提交
  12. 09 12月, 2008 2 次提交
  13. 03 11月, 2008 1 次提交
    • T
      Remove the last vestiges of the MAKE_PTR/MAKE_OFFSET mechanism. We haven't · d7112cfa
      Tom Lane 提交于
      allowed different processes to have different addresses for the shmem segment
      in quite a long time, but there were still a few places left that used the
      old coding convention.  Clean them up to reduce confusion and improve the
      compiler's ability to detect pointer type mismatches.
      
      Kris Jurka
      d7112cfa
  14. 10 6月, 2008 1 次提交
  15. 09 6月, 2008 1 次提交
  16. 27 1月, 2008 1 次提交
    • T
      Change StatementCancelHandler() to check the DoingCommandRead flag to decide · 6322e844
      Tom Lane 提交于
      whether to execute an immediate interrupt, rather than testing whether
      LockWaitCancel() cancelled a lock wait.  The old way misclassified the case
      where we were blocked in ProcWaitForSignal(), and arguably would misclassify
      any other future additions of new ImmediateInterruptOK states too.  This
      allows reverting the old kluge that gave LockWaitCancel() a return value,
      since no callers care anymore.  Improve comments in the various
      implementations of PGSemaphoreLock() to explain that on some platforms, the
      assumption that semop() exits after a signal is wrong, and so we must ensure
      that the signal handler itself throws elog if we want cancel or die interrupts
      to be effective.  Per testing related to bug #3883, though this patch doesn't
      solve those problems fully.
      
      Perhaps this change should be back-patched, but since pre-8.3 branches aren't
      really relying on autovacuum to respond to SIGINT, it doesn't seem critical
      for them.
      6322e844
  17. 02 1月, 2008 1 次提交
  18. 16 11月, 2007 1 次提交
  19. 27 10月, 2007 1 次提交
  20. 25 10月, 2007 1 次提交
  21. 09 9月, 2007 1 次提交
    • T
      Replace the former method of determining snapshot xmax --- to wit, calling · 6bd4f401
      Tom Lane 提交于
      ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid"
      variable that is updated during transaction commit or abort.  Since
      latestCompletedXid is written only in places that had to lock ProcArrayLock
      exclusively anyway, and is read only in places that had to lock ProcArrayLock
      shared anyway, it adds no new locking requirements to the system despite being
      cluster-wide.  Moreover, removing ReadNewTransactionId from snapshot
      acquisition eliminates the need to take both XidGenLock and ProcArrayLock at
      the same time.  Since XidGenLock is sometimes held across I/O this can be a
      significant win.  Some preliminary benchmarking suggested that this patch has
      no effect on average throughput but can significantly improve the worst-case
      transaction times seen in pgbench.  Concept by Florian Pflug, implementation
      by Tom Lane.
      6bd4f401
  22. 06 9月, 2007 1 次提交
    • T
      Implement lazy XID allocation: transactions that do not modify any database · 295e6398
      Tom Lane 提交于
      rows will normally never obtain an XID at all.  We already did things this way
      for subtransactions, but this patch extends the concept to top-level
      transactions.  In applications where there are lots of short read-only
      transactions, this should improve performance noticeably; not so much from
      removal of the actual XID-assignments, as from reduction of overhead that's
      driven by the rate of XID consumption.  We add a concept of a "virtual
      transaction ID" so that active transactions can be uniquely identified even
      if they don't have a regular XID.  This is a much lighter-weight concept:
      uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
      record is made about them.
      
      Florian Pflug, with some editorialization by Tom.
      295e6398
  23. 28 8月, 2007 1 次提交
    • T
      Improve behavior of log_lock_waits patch. Ensure that something gets logged · 24d4517b
      Tom Lane 提交于
      even if the "deadlock detected" ERROR message is suppressed by an exception
      catcher.  Be clearer about the event sequence when a soft deadlock is fixed:
      the fixing process might or might not still have to wait, so log that
      separately.  Fix race condition when someone releases us from the lock partway
      through printing all this junk --- we'd not get confused about our state, but
      the log message sequence could have been misleading, ie, a "still waiting"
      message with no subsequent "acquired" message.  Greg Stark and Tom Lane.
      24d4517b
  24. 17 7月, 2007 1 次提交
  25. 20 6月, 2007 2 次提交
    • T
      Only log 'process acquired lock' if we actually did get the lock. This · 9cce91db
      Tom Lane 提交于
      test seems inessential right now since the only control path for not
      getting the lock is via CHECK_FOR_INTERRUPTS which won't return control
      to ProcSleep, but it would be important if we ever allow the deadlock
      code to kill someone else's transaction instead of our own.
      9cce91db
    • T
      Code review for log_lock_waits patch. Don't try to issue log messages from · 6e072287
      Tom Lane 提交于
      within a signal handler (this might be safe given the relatively narrow code
      range in which the interrupt is enabled, but it seems awfully risky); do issue
      more informative log messages that tell what is being waited for and the exact
      length of the wait; minor other code cleanup.  Greg Stark and Tom Lane
      6e072287
  26. 17 4月, 2007 1 次提交
    • A
      Add a multi-worker capability to autovacuum. This allows multiple worker · e2a186b0
      Alvaro Herrera 提交于
      processes to be running simultaneously.  Also, now autovacuum processes do not
      count towards the max_connections limit; they are counted separately from
      regular processes, and are limited by the new GUC variable
      autovacuum_max_workers.
      
      The launcher now has intelligence to launch workers on each database every
      autovacuum_naptime seconds, limited only on the max amount of worker slots
      available.
      
      Also, the global worker I/O utilization is limited by the vacuum cost-based
      delay feature.  Workers are "balanced" so that the total I/O consumption does
      not exceed the established limit.  This part of the patch was contributed by
      ITAGAKI Takahiro.
      
      Per discussion.
      e2a186b0
  27. 04 4月, 2007 1 次提交
    • T
      Remove the CheckpointStartLock in favor of having backends show whether they · 9c9b6194
      Tom Lane 提交于
      are in their commit critical sections via flags in the ProcArray.  Checkpoint
      can watch the ProcArray to determine when it's safe to proceed.  This is
      a considerably better solution to the original problem of race conditions
      between checkpoint and transaction commit: it speeds up commit, since there's
      one less lock to fool with, and it prevents the problem of checkpoint being
      delayed indefinitely when there's a constant flow of commits.  Heikki, with
      some kibitzing from Tom.
      9c9b6194
  28. 07 3月, 2007 1 次提交
  29. 04 3月, 2007 1 次提交
  30. 16 2月, 2007 1 次提交
    • A
      Restructure autovacuum in two processes: a dummy process, which runs · 18206509
      Alvaro Herrera 提交于
      continuously, and requests vacuum runs of "autovacuum workers" to postmaster.
      The workers do the actual vacuum work.  This allows for future improvements,
      like allowing multiple autovacuum jobs running in parallel.
      
      For now, the code keeps the original behavior of having a single autovac
      process at any time by sleeping until the previous worker has finished.
      18206509
  31. 16 1月, 2007 1 次提交
  32. 06 1月, 2007 1 次提交
  33. 22 11月, 2006 1 次提交
    • T
      On systems that have setsid(2) (which should be just about everything except · 3ad0728c
      Tom Lane 提交于
      Windows), arrange for each postmaster child process to be its own process
      group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
      process group not only the direct child process.  This provides saner behavior
      for archive and recovery scripts; in particular, it's possible to shut down a
      warm-standby recovery server using "pg_ctl stop -m immediate", since delivery
      of SIGQUIT to the startup subprocess will result in killing the waiting
      recovery_command.  Also, this makes Query Cancel and statement_timeout apply
      to scripts being run from backends via system().  (There is no support in the
      core backend for that, but it's widely done using untrusted PLs.)  Per gripe
      from Stephen Harris and subsequent discussion.
      3ad0728c
  34. 04 10月, 2006 1 次提交
  35. 30 7月, 2006 1 次提交
  36. 24 7月, 2006 1 次提交
  37. 14 7月, 2006 2 次提交