1. 25 11月, 2011 1 次提交
    • R
      Move "hot" members of PGPROC into a separate PGXACT array. · ed0b409d
      Robert Haas 提交于
      This speeds up snapshot-taking and reduces ProcArrayLock contention.
      Also, the PGPROC (and PGXACT) structures used by two-phase commit are
      now allocated as part of the main array, rather than in a separate
      array, and we keep ProcArray sorted in pointer order.  These changes
      are intended to minimize the number of cache lines that must be pulled
      in to take a snapshot, and testing shows a substantial increase in
      performance on both read and write workloads at high concurrencies.
      
      Pavan Deolasee, Heikki Linnakangas, Robert Haas
      ed0b409d
  2. 19 11月, 2011 1 次提交
    • T
      Avoid floating-point underflow while tracking buffer allocation rate. · 40d35036
      Tom Lane 提交于
      When the system is idle for awhile after activity, the "smoothed_alloc"
      state variable in BgBufferSync converges slowly to zero.  With standard
      IEEE float arithmetic this results in several iterations with denormalized
      values, which causes kernel traps and annoying log messages on some
      poorly-designed platforms.  There's no real need to track such small values
      of smoothed_alloc, so we can prevent the kernel traps by forcing it to zero
      as soon as it's too small to be interesting for our purposes.  This issue
      is purely cosmetic, since the iterations don't happen fast enough for the
      kernel traps to pose any meaningful performance problem, but still it seems
      worth shutting up the log messages.
      
      The kernel log messages were previously reported by a number of people,
      but kudos to Greg Matthews for tracking down exactly where they were coming
      from.
      40d35036
  3. 11 11月, 2011 1 次提交
  4. 02 11月, 2011 4 次提交
    • S
      Derive oldestActiveXid at correct time for Hot Standby. · 86e33648
      Simon Riggs 提交于
      There was a timing window between when oldestActiveXid was derived
      and when it should have been derived that only shows itself under
      heavy load. Move code around to ensure correct timing of derivation.
      No change to StartupSUBTRANS() code, which is where this failed.
      
      Bug report by Chris Redekop
      86e33648
    • S
      Start Hot Standby faster when initial snapshot is incomplete. · 10b7c686
      Simon Riggs 提交于
      If the initial snapshot had overflowed then we can start whenever
      the latest snapshot is empty, not overflowed or as we did already,
      start when the xmin on primary was higher than xmax of our starting
      snapshot, which proves we have full snapshot data.
      
      Bug report by Chris Redekop
      10b7c686
    • R
      Initialize myProcLocks queues just once, at postmaster startup. · c2891b46
      Robert Haas 提交于
      In assert-enabled builds, we assert during the shutdown sequence that
      the queues have been properly emptied, and during process startup that
      we are inheriting empty queues.  In non-assert enabled builds, we just
      save a few cycles.
      c2891b46
    • S
      Split work of bgwriter between 2 processes: bgwriter and checkpointer. · 806a2aee
      Simon Riggs 提交于
      bgwriter is now a much less important process, responsible for page
      cleaning duties only. checkpointer is now responsible for checkpoints
      and so has a key role in shutdown. Later patches will correct doc
      references to the now old idea that bgwriter performs checkpoints.
      Has beneficial effect on performance at high write rates, but mainly
      refactoring to more easily allow changes for power reduction by
      simplifying previously tortuous code around required to allow page
      cleaning and checkpointing to time slice in the same process.
      
      Patch by me, Review by Dickson Guedes
      806a2aee
  5. 29 10月, 2011 1 次提交
    • R
      Allow hint bits to be set sooner for temporary and unlogged tables. · 53f1ca59
      Robert Haas 提交于
      We need not wait until the commit record is durably on disk, because
      in the event of a crash the page we're updating with hint bits will
      be gone anyway.  Per off-list report from Heikki Linnakangas, this
      can significantly degrade the performance of unlogged tables; I was
      able to show a 2x speedup from this patch on a pgbench run with scale
      factor 15.  In practice, this will mostly help small, heavily updated
      tables, because on larger tables you're unlikely to run into the same
      row again before the commit record makes it out to disk.
      53f1ca59
  6. 28 10月, 2011 1 次提交
    • H
      Fix the number of lwlocks needed by the "fast path" lock patch. It needs · cbf65509
      Heikki Linnakangas 提交于
      one lock per backend or auxiliary process - the need for a lock for each
      aux processes was not accounted for in NumLWLocks(). No-one noticed,
      because the three locks needed for the three aux processes fit into the
      few extra lwlocks we allocate for 3rd party modules that don't call
      RequestAddinLWLocks() (NUM_USER_DEFINED_LWLOCKS, 4 by default).
      cbf65509
  7. 23 10月, 2011 1 次提交
    • T
      Support synchronization of snapshots through an export/import procedure. · bb446b68
      Tom Lane 提交于
      A transaction can export a snapshot with pg_export_snapshot(), and then
      others can import it with SET TRANSACTION SNAPSHOT.  The data does not
      leave the server so there are not security issues.  A snapshot can only
      be imported while the exporting transaction is still running, and there
      are some other restrictions.
      
      I'm not totally convinced that we've covered all the bases for SSI (true
      serializable) mode, but it works fine for lesser isolation modes.
      
      Joachim Wieland, reviewed by Marko Tiikkaja, and rather heavily modified
      by Tom Lane
      bb446b68
  8. 21 10月, 2011 1 次提交
    • T
      Simplify and improve ProcessStandbyHSFeedbackMessage logic. · b4a0223d
      Tom Lane 提交于
      There's no need to clamp the standby's xmin to be greater than
      GetOldestXmin's result; if there were any such need this logic would be
      hopelessly inadequate anyway, because it fails to account for
      within-database versus cluster-wide values of GetOldestXmin.  So get rid of
      that, and just rely on sanity-checking that the xmin is not wrapped around
      relative to the nextXid counter.  Also, don't reset the walsender's xmin if
      the current feedback xmin is indeed out of range; that just creates more
      problems than we already had.  Lastly, don't bother to take the
      ProcArrayLock; there's no need to do that to set xmin.
      
      Also improve the comments about this in GetOldestXmin itself.
      b4a0223d
  9. 14 10月, 2011 1 次提交
  10. 11 10月, 2011 1 次提交
  11. 27 9月, 2011 1 次提交
    • T
      Allow snapshot references to still work during transaction abort. · 57eb0090
      Tom Lane 提交于
      In REPEATABLE READ (nee SERIALIZABLE) mode, an attempt to do
      GetTransactionSnapshot() between AbortTransaction and CleanupTransaction
      failed, because GetTransactionSnapshot would recompute the transaction
      snapshot (which is already wrong, given the isolation mode) and then
      re-register it in the TopTransactionResourceOwner, leading to an Assert
      because the TopTransactionResourceOwner should be empty of resources after
      AbortTransaction.  This is the root cause of bug #6218 from Yamamoto
      Takashi.  While changing plancache.c to avoid requesting a snapshot when
      handling a ROLLBACK masks the problem, I think this is really a snapmgr.c
      bug: it's lower-level than the resource manager mechanism and should not be
      shutting itself down before we unwind resource manager resources.  However,
      just postponing the release of the transaction snapshot until cleanup time
      didn't work because of the circular dependency with
      TopTransactionResourceOwner.  Fix by managing the internal reference to
      that snapshot manually instead of depending on TopTransactionResourceOwner.
      This saves a few cycles as well as making the module layering more
      straightforward.  predicate.c's dependencies on TopTransactionResourceOwner
      go away too.
      
      I think this is a longstanding bug, but there's no evidence that it's more
      than a latent bug, so it doesn't seem worth any risk of back-patching.
      57eb0090
  12. 24 9月, 2011 1 次提交
    • R
      Memory barrier support for PostgreSQL. · 0c8eda62
      Robert Haas 提交于
      This is not actually used anywhere yet, but it gets the basic
      infrastructure in place.  It is fairly likely that there are bugs, and
      support for some important platforms may be missing, so we'll need to
      refine this as we go along.
      0c8eda62
  13. 12 9月, 2011 1 次提交
    • P
      Remove many -Wcast-qual warnings · 1b81c2fe
      Peter Eisentraut 提交于
      This addresses only those cases that are easy to fix by adding or
      moving a const qualifier or removing an unnecessary cast.  There are
      many more complicated cases remaining.
      1b81c2fe
  14. 10 9月, 2011 1 次提交
    • T
      Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h. · a7801b62
      Tom Lane 提交于
      As per my recent proposal, this refactors things so that these typedefs and
      macros are available in a header that can be included in frontend-ish code.
      I also changed various headers that were undesirably including
      utils/timestamp.h to include datatype/timestamp.h instead.  Unsurprisingly,
      this showed that half the system was getting utils/timestamp.h by way of
      xlog.h.
      
      No actual code changes here, just header refactoring.
      a7801b62
  15. 04 9月, 2011 1 次提交
    • T
      Clean up the #include mess a little. · 1609797c
      Tom Lane 提交于
      walsender.h should depend on xlog.h, not vice versa.  (Actually, the
      inclusion was circular until a couple hours ago, which was even sillier;
      but Bruce broke it in the expedient rather than logically correct
      direction.)  Because of that poor decision, plus blind application of
      pgrminclude, we had a situation where half the system was depending on
      xlog.h to include such unrelated stuff as array.h and guc.h.  Clean up
      the header inclusion, and manually revert a lot of what pgrminclude had
      done so things build again.
      
      This episode reinforces my feeling that pgrminclude should not be run
      without adult supervision.  Inclusion changes in header files in particular
      need to be reviewed with great care.  More generally, it'd be good if we
      had a clearer notion of module layering to dictate which headers can sanely
      include which others ... but that's a big task for another day.
      1609797c
  16. 01 9月, 2011 1 次提交
  17. 29 8月, 2011 1 次提交
    • R
      Improve spinlock performance for HP-UX, ia64, non-gcc. · c01c25fb
      Robert Haas 提交于
      At least on this architecture, it's very important to spin on a
      non-atomic instruction and only retry the atomic once it appears
      that it will succeed.  To fix this, split TAS() into two macros:
      TAS(), for trying to grab the lock the first time, and TAS_SPIN(),
      for spinning until we get it.  TAS_SPIN() defaults to same as TAS(),
      but we can override it when we know there's a better way.
      
      It's likely that some of the other cases in s_lock.h require
      similar treatment, but this is the only one we've got conclusive
      evidence for at present.
      c01c25fb
  18. 27 8月, 2011 1 次提交
  19. 23 8月, 2011 1 次提交
  20. 18 8月, 2011 1 次提交
    • R
      Remove obsolete README file. · 24bf1552
      Robert Haas 提交于
      Perhaps we ought to add some other kind of documentation here instead,
      but for now let's get rid of this woefully obsolete description of the
      sinval machinery.
      24bf1552
  21. 15 8月, 2011 1 次提交
    • P
      Add "Reason code" prefix to internal SSI error messages · e5475a80
      Peter Eisentraut 提交于
      This makes it clearer that the error message is perhaps not supposed
      to be understood by users, and it also makes it somewhat clearer that
      it was not accidentally omitted from translation.
      
      Idea from Heikki Linnakangas, except that we don't mark "Reason code"
      for translation at this point, because that would make the
      implementation too cumbersome.
      e5475a80
  22. 11 8月, 2011 1 次提交
    • T
      Change the autovacuum launcher to use WaitLatch instead of a poll loop. · 4dab3d5a
      Tom Lane 提交于
      In pursuit of this (and with the expectation that WaitLatch will be needed
      in more places), convert the latch field that was already added to PGPROC
      for sync rep into a generic latch that is activated for all PGPROC-owning
      processes, and change many of the standard backend signal handlers to set
      that latch when a signal happens.  This will allow WaitLatch callers to be
      wakened properly by these signals.
      
      In passing, fix a whole bunch of signal handlers that had been hacked to do
      things that might change errno, without adding the necessary save/restore
      logic for errno.  Also make some minor fixes in unix_latch.c, and clean
      up bizarre and unsafe scheme for disowning the process's latch.  Much of
      this has to be back-patched into 9.1.
      
      Peter Geoghegan, with additional work by Tom
      4dab3d5a
  23. 10 8月, 2011 1 次提交
    • T
      Documentation improvement and minor code cleanups for the latch facility. · 4e15a4db
      Tom Lane 提交于
      Improve the documentation around weak-memory-ordering risks, and do a pass
      of general editorialization on the comments in the latch code.  Make the
      Windows latch code more like the Unix latch code where feasible; in
      particular provide the same Assert checks in both implementations.
      Fix poorly-placed WaitLatch call in syncrep.c.
      
      This patch resolves, for the moment, concerns around weak-memory-ordering
      bugs in latch-related code: we have documented the restrictions and checked
      that existing calls meet them.  In 9.2 I hope that we will install suitable
      memory barrier instructions in SetLatch/ResetLatch, so that their callers
      don't need to be quite so careful.
      4e15a4db
  24. 05 8月, 2011 1 次提交
    • R
      Create VXID locks "lazily" in the main lock table. · 84e37126
      Robert Haas 提交于
      Instead of entering them on transaction startup, we materialize them
      only when someone wants to wait, which will occur only during CREATE
      INDEX CONCURRENTLY.  In Hot Standby mode, the startup process must also
      be able to probe for conflicting VXID locks, but the lock need never be
      fully materialized, because the startup process does not use the normal
      lock wait mechanism.  Since most VXID locks never need to touch the
      lock manager partition locks, this can significantly reduce blocking
      contention on read-heavy workloads.
      
      Patch by me.  Review by Jeff Davis.
      84e37126
  25. 03 8月, 2011 2 次提交
    • T
      Move CheckRecoveryConflictDeadlock() call to a safer place. · ac36e6f7
      Tom Lane 提交于
      This kluge was inserted in a spot apparently chosen at random: the lock
      manager's state is not yet fully set up for the wait, and in particular
      LockWaitCancel hasn't been armed by setting lockAwaited, so the ProcLock
      will not get cleaned up if the ereport is thrown.  This seems to not cause
      any observable problem in trivial test cases, because LockReleaseAll will
      silently clean up the debris; but I was able to cause failures with tests
      involving subtransactions.
      
      Fixes breakage induced by commit c85c9414.
      Back-patch to all affected branches.
      ac36e6f7
    • T
      Fix incorrect initialization of ProcGlobal->startupBufferPinWaitBufId. · 2e53bd55
      Tom Lane 提交于
      It was initialized in the wrong place and to the wrong value.  With bad
      luck this could result in incorrect query-cancellation failures in hot
      standby sessions, should a HS backend be holding pin on buffer number 1
      while trying to acquire a lock.
      2e53bd55
  26. 01 8月, 2011 1 次提交
  27. 30 7月, 2011 1 次提交
    • R
      Reduce sinval synchronization overhead. · b4fbe392
      Robert Haas 提交于
      Testing shows that the overhead of acquiring and releasing
      SInvalReadLock and msgNumLock on high-core count boxes can waste a lot
      of CPU time and hurt performance.  This patch adds a per-backend flag
      that allows us to skip all that locking in most cases.  Further
      testing shows that this improves performance even when sinval traffic
      is very high.
      
      Patch by me.  Review and testing by Noah Misch.
      b4fbe392
  28. 28 7月, 2011 1 次提交
  29. 20 7月, 2011 1 次提交
    • R
      Some refinement for the "fast path" lock patch. · 8e5ac74c
      Robert Haas 提交于
      1. In GetLockStatusData, avoid initializing instance before we've ensured
      that the array is large enough.  Otherwise, if repalloc moves the block
      around, we're hosed.
      
      2. Add the word "Relation" to the name of some identifiers, to avoid
      assuming that the fast-path mechanism will only ever apply to relations
      (though these particular parts certainly will).  Some of the macros
      could possibly use similar treatment, but the names are getting awfully
      long already.
      
      3. Add a missing word to comment in AtPrepare_Locks().
      8e5ac74c
  30. 19 7月, 2011 1 次提交
  31. 18 7月, 2011 3 次提交
    • R
      Create a "fast path" for acquiring weak relation locks. · 3cba8999
      Robert Haas 提交于
      When an AccessShareLock, RowShareLock, or RowExclusiveLock is requested
      on an unshared database relation, and we can verify that no conflicting
      locks can possibly be present, record the lock in a per-backend queue,
      stored within the PGPROC, rather than in the primary lock table.  This
      eliminates a great deal of contention on the lock manager LWLocks.
      
      This patch also refactors the interface between GetLockStatusData() and
      pg_lock_status() to be a bit more abstract, so that we don't rely so
      heavily on the lock manager's internal representation details.  The new
      fast path lock structures don't have a LOCK or PROCLOCK structure to
      return, so we mustn't depend on that for purposes of listing outstanding
      locks.
      
      Review by Jeff Davis.
      3cba8999
    • T
      Further thoughts about temp_file_limit patch. · 9473bb96
      Tom Lane 提交于
      Move FileClose's decrement of temporary_files_size up, so that it will be
      executed even if elog() throws an error.  This is reasonable since if the
      unlink() fails, the fact the file is still there is not our fault, and we
      are going to forget about it anyhow.  So we won't count it against
      temp_file_limit anymore.
      
      Update fileSize and temporary_files_size correctly in FileTruncate.
      We probably don't have any places that truncate temp files, but fd.c
      surely should not assume that.
      9473bb96
    • T
      Add temp_file_limit GUC parameter to constrain temporary file space usage. · 23e5b16c
      Tom Lane 提交于
      The limit is enforced against the total amount of temp file space used by
      each session.
      
      Mark Kirkwood, reviewed by Cédric Villemain and Tatsuo Ishii
      23e5b16c
  32. 17 7月, 2011 2 次提交
  33. 09 7月, 2011 1 次提交
    • R
      Try to acquire relation locks in RangeVarGetRelid. · 4240e429
      Robert Haas 提交于
      In the previous coding, we would look up a relation in RangeVarGetRelid,
      lock the resulting OID, and then AcceptInvalidationMessages().  While
      this was sufficient to ensure that we noticed any changes to the
      relation definition before building the relcache entry, it didn't
      handle the possibility that the name we looked up no longer referenced
      the same OID.  This was particularly problematic in the case where a
      table had been dropped and recreated: we'd latch on to the entry for
      the old relation and fail later on.  Now, we acquire the relation lock
      inside RangeVarGetRelid, and retry the name lookup if we notice that
      invalidation messages have been processed meanwhile.  Many operations
      that would previously have failed with an error in the presence of
      concurrent DDL will now succeed.
      
      There is a good deal of work remaining to be done here: many callers
      of RangeVarGetRelid still pass NoLock for one reason or another.  In
      addition, nothing in this patch guards against the possibility that
      the meaning of an unqualified name might change due to the creation
      of a relation in a schema earlier in the user's search path than the
      one where it was previously found.  Furthermore, there's nothing at
      all here to guard against similar race conditions for non-relations.
      For all that, it's a start.
      
      Noah Misch and Robert Haas
      4240e429