1. 07 2月, 2016 1 次提交
    • R
      Introduce group locking to prevent parallel processes from deadlocking. · a1c1af2a
      Robert Haas 提交于
      For locking purposes, we now regard heavyweight locks as mutually
      non-conflicting between cooperating parallel processes.  There are some
      possible pitfalls to this approach that are not to be taken lightly,
      but it works OK for now and can be changed later if we find a better
      approach.  Without this, it's very easy for parallel queries to
      silently self-deadlock if the user backend holds strong relation locks.
      
      Robert Haas, with help from Amit Kapila.  Thanks to Noah Misch and
      Andres Freund for extensive discussion of possible issues with this
      approach.
      a1c1af2a
  2. 29 1月, 2016 1 次提交
    • R
      Migrate PGPROC's backendLock into PGPROC itself, using a new tranche. · b319356f
      Robert Haas 提交于
      Previously, each PGPROC's backendLock was part of the main tranche,
      and the PGPROC just contained a pointer.  Now, the actual LWLock is
      part of the PGPROC.
      
      As with previous, similar patches, this makes it significantly easier
      to identify these lwlocks in LWLOCK_STATS or Trace_lwlocks output
      and improves modularity.
      
      Author: Ildus Kurbangaliev
      Reviewed-by: Amit Kapila, Robert Haas
      b319356f
  3. 03 1月, 2016 1 次提交
  4. 28 9月, 2015 1 次提交
  5. 21 9月, 2015 1 次提交
    • T
      Be more wary about partially-valid LOCALLOCK data in RemoveLocalLock(). · ba51774d
      Tom Lane 提交于
      RemoveLocalLock() must consider the possibility that LockAcquireExtended()
      failed to palloc the initial space for a locallock's lockOwners array.
      I had evidently meant to cope with this hazard when the code was originally
      written (commit 1785aceb), but missed that
      the pfree needed to be protected with an if-test.  Just to make sure things
      are left in a clean state, reset numLockOwners as well.
      
      Per low-memory testing by Andreas Seltenreich.  Back-patch to all supported
      branches.
      ba51774d
  6. 30 1月, 2015 1 次提交
    • A
      Properly terminate the array returned by GetLockConflicts(). · 17792bfc
      Andres Freund 提交于
      GetLockConflicts() has for a long time not properly terminated the
      returned array. During normal processing the returned array is zero
      initialized which, while not pretty, is sufficient to be recognized as
      a invalid virtual transaction id. But the HotStandby case is more than
      aesthetically broken: The allocated (and reused) array is neither
      zeroed upon allocation, nor reinitialized, nor terminated.
      
      Not having a terminating element means that the end of the array will
      not be recognized and that recovery conflict handling will thus read
      ahead into adjacent memory. Only terminating when hitting memory
      content that looks like a invalid virtual transaction id.  Luckily
      this seems so far not have caused significant problems, besides making
      recovery conflict more expensive.
      
      Discussion: 20150127142713.GD29457@awork2.anarazel.de
      
      Backpatch into all supported branches.
      17792bfc
  7. 07 1月, 2015 1 次提交
  8. 19 12月, 2014 1 次提交
    • T
      Improve hash_create's API for selecting simple-binary-key hash functions. · 4a14f13a
      Tom Lane 提交于
      Previously, if you wanted anything besides C-string hash keys, you had to
      specify a custom hashing function to hash_create().  Nearly all such
      callers were specifying tag_hash or oid_hash; which is tedious, and rather
      error-prone, since a caller could easily miss the opportunity to optimize
      by using hash_uint32 when appropriate.  Replace this with a design whereby
      callers using simple binary-data keys just specify HASH_BLOBS and don't
      need to mess with specific support functions.  hash_create() itself will
      take care of optimizing when the key size is four bytes.
      
      This nets out saving a few hundred bytes of code space, and offers
      a measurable performance improvement in tidbitmap.c (which was not
      exploiting the opportunity to use hash_uint32 for its 4-byte keys).
      There might be some wins elsewhere too, I didn't analyze closely.
      
      In future we could look into offering a similar optimized hashing function
      for 8-byte keys.  Under this design that could be done in a centralized
      and machine-independent fashion, whereas getting it right for keys of
      platform-dependent sizes would've been notationally painful before.
      
      For the moment, the old way still works fine, so as not to break source
      code compatibility for loadable modules.  Eventually we might want to
      remove tag_hash and friends from the exported API altogether, since there's
      no real need for them to be explicitly referenced from outside dynahash.c.
      
      Teodor Sigaev and Tom Lane
      4a14f13a
  9. 06 11月, 2014 1 次提交
    • H
      Move the backup-block logic from XLogInsert to a new file, xloginsert.c. · 2076db2a
      Heikki Linnakangas 提交于
      xlog.c is huge, this makes it a little bit smaller, which is nice. Functions
      related to putting together the WAL record are in xloginsert.c, and the
      lower level stuff for managing WAL buffers and such are in xlog.c.
      
      Also move the definition of XLogRecord to a separate header file. This
      causes churn in the #includes of all the files that write WAL records, and
      redo routines, but it avoids pulling in xlog.h into most places.
      
      Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
      2076db2a
  10. 24 7月, 2014 1 次提交
  11. 07 5月, 2014 1 次提交
    • B
      pgindent run for 9.4 · 0a783200
      Bruce Momjian 提交于
      This includes removing tabs after periods in C comments, which was
      applied to back branches, so this change should not effect backpatching.
      0a783200
  12. 07 4月, 2014 1 次提交
    • R
      Assert that strong-lock count is >0 everywhere it's decremented. · 315772e4
      Robert Haas 提交于
      The one existing assertion of this type has tripped a few times in the
      buildfarm lately, but it's not clear whether the problem is really
      originating there or whether it's leftovers from a trip through one
      of the other two paths that lack a matching assertion.  So add one.
      
      Since the same bug(s) most likely exist(s) in the back-branches also,
      back-patch to 9.2, where the fast-path lock mechanism was added.
      315772e4
  13. 01 4月, 2014 1 次提交
    • R
      Mark FastPathStrongRelationLocks volatile. · 4bc15a8b
      Robert Haas 提交于
      Otherwise, the compiler might decide to move modifications to data
      within this structure outside the enclosing SpinLockAcquire /
      SpinLockRelease pair, leading to shared memory corruption.
      
      This may or may not explain a recent lmgr-related buildfarm failure
      on prairiedog, but it needs to be fixed either way.
      4bc15a8b
  14. 28 1月, 2014 1 次提交
    • R
      Relax the requirement that all lwlocks be stored in a single array. · ea9df812
      Robert Haas 提交于
      This makes it possible to store lwlocks as part of some other data
      structure in the main shared memory segment, or in a dynamic shared
      memory segment.  There is still a main LWLock array and this patch does
      not move anything out of it, but it provides necessary infrastructure
      for doing that in the future.
      
      This change is likely to increase the size of LWLockPadded on some
      platforms, especially 32-bit platforms where it was previously only
      16 bytes.
      
      Patch by me.  Review by Andres Freund and KaiGai Kohei.
      ea9df812
  15. 08 1月, 2014 1 次提交
  16. 16 12月, 2013 1 次提交
  17. 30 11月, 2013 1 次提交
    • T
      Be sure to release proc->backendLock after SetupLockInTable() failure. · 8b151558
      Tom Lane 提交于
      The various places that transferred fast-path locks to the main lock table
      neglected to release the PGPROC's backendLock if SetupLockInTable failed
      due to being out of shared memory.  In most cases this is no big deal since
      ensuing error cleanup would release all held LWLocks anyway.  But there are
      some hot-standby functions that don't consider failure of
      FastPathTransferRelationLocks to be a hard error, and in those cases this
      oversight could lead to system lockup.  For consistency, make all of these
      places look the same as FastPathTransferRelationLocks.
      
      Noted while looking for the cause of Dan Wood's bugs --- this wasn't it,
      but it's a bug anyway.
      8b151558
  18. 29 11月, 2013 1 次提交
    • T
      Fix latent(?) race condition in LockReleaseAll. · da8a7160
      Tom Lane 提交于
      We have for a long time checked the head pointer of each of the backend's
      proclock lists and skipped acquiring the corresponding locktable partition
      lock if the head pointer was NULL.  This was safe enough in the days when
      proclock lists were changed only by the owning backend, but it is pretty
      questionable now that the fast-path patch added cases where backends add
      entries to other backends' proclock lists.  However, we don't really wish
      to revert to locking each partition lock every time, because in simple
      transactions that would add a lot of useless lock/unlock cycles on
      already-heavily-contended LWLocks.  Fortunately, the only way that another
      backend could be modifying our proclock list at this point would be if it
      was promoting a formerly fast-path lock of ours; and any such lock must be
      one that we'd decided not to delete in the previous loop over the locallock
      table.  So it's okay if we miss seeing it in this loop; we'd just decide
      not to delete it again.  However, once we've detected a non-empty list,
      we'd better re-fetch the list head pointer after acquiring the partition
      lock.  This guards against possibly fetching a corrupt-but-non-null pointer
      if pointer fetch/store isn't atomic.  It's not clear if any practical
      architectures are like that, but we've never assumed that before and don't
      wish to start here.  In any case, the situation certainly deserves a code
      comment.
      
      While at it, refactor the partition traversal loop to use a for() construct
      instead of a while() loop with goto's.
      
      Back-patch, just in case the risk is real and not hypothetical.
      da8a7160
  19. 28 11月, 2013 1 次提交
    • T
      Fix stale-pointer problem in fast-path locking logic. · 7db285af
      Tom Lane 提交于
      When acquiring a lock in fast-path mode, we must reset the locallock
      object's lock and proclock fields to NULL.  They are not necessarily that
      way to start with, because the locallock could be left over from a failed
      lock acquisition attempt earlier in the transaction.  Failure to do this
      led to all sorts of interesting misbehaviors when LockRelease tried to
      clean up no-longer-related lock and proclock objects in shared memory.
      Per report from Dan Wood.
      
      In passing, modify LockRelease to elog not just Assert if it doesn't find
      lock and proclock objects for a formerly fast-path lock, matching the code
      in FastPathGetRelationLockEntry and LockRefindAndRelease.  This isn't a
      bug but it will help in diagnosing any future bugs in this area.
      
      Also, modify FastPathTransferRelationLocks and FastPathGetRelationLockEntry
      to break out of their loops over the fastpath array once they've found the
      sole matching entry.  This was inconsistently done in some search loops
      and not others.
      
      Improve assorted related comments, too.
      
      Back-patch to 9.2 where the fast-path mechanism was introduced.
      7db285af
  20. 08 11月, 2013 1 次提交
  21. 17 9月, 2013 1 次提交
  22. 05 6月, 2013 1 次提交
    • T
      Fix memory leak in LogStandbySnapshot(). · dbc6eb1f
      Tom Lane 提交于
      The array allocated by GetRunningTransactionLocks() needs to be pfree'd
      when we're done with it.  Otherwise we leak some memory during each
      checkpoint, if wal_level = hot_standby.  This manifests as memory bloat
      in the checkpointer process, or in bgwriter in versions before we made
      the checkpointer separate.
      
      Reported and fixed by Naoya Anzai.  Back-patch to 9.0 where the issue
      was introduced.
      
      In passing, improve comments for GetRunningTransactionLocks(), and add
      an Assert that we didn't overrun the palloc'd array.
      dbc6eb1f
  23. 30 5月, 2013 1 次提交
  24. 25 4月, 2013 1 次提交
  25. 23 1月, 2013 1 次提交
    • A
      Improve concurrency of foreign key locking · 0ac5ad51
      Alvaro Herrera 提交于
      This patch introduces two additional lock modes for tuples: "SELECT FOR
      KEY SHARE" and "SELECT FOR NO KEY UPDATE".  These don't block each
      other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
      FOR UPDATE".  UPDATE commands that do not modify the values stored in
      the columns that are part of the key of the tuple now grab a SELECT FOR
      NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
      with tuple locks of the FOR KEY SHARE variety.
      
      Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
      means the concurrency improvement applies to them, which is the whole
      point of this patch.
      
      The added tuple lock semantics require some rejiggering of the multixact
      module, so that the locking level that each transaction is holding can
      be stored alongside its Xid.  Also, multixacts now need to persist
      across server restarts and crashes, because they can now represent not
      only tuple locks, but also tuple updates.  This means we need more
      careful tracking of lifetime of pg_multixact SLRU files; since they now
      persist longer, we require more infrastructure to figure out when they
      can be removed.  pg_upgrade also needs to be careful to copy
      pg_multixact files over from the old server to the new, or at least part
      of multixact.c state, depending on the versions of the old and new
      servers.
      
      Tuple time qualification rules (HeapTupleSatisfies routines) need to be
      careful not to consider tuples with the "is multi" infomask bit set as
      being only locked; they might need to look up MultiXact values (i.e.
      possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
      whereas they previously were assured to only use information readily
      available from the tuple header.  This is considered acceptable, because
      the extra I/O would involve cases that would previously cause some
      commands to block waiting for concurrent transactions to finish.
      
      Another important change is the fact that locking tuples that have
      previously been updated causes the future versions to be marked as
      locked, too; this is essential for correctness of foreign key checks.
      This causes additional WAL-logging, also (there was previously a single
      WAL record for a locked tuple; now there are as many as updated copies
      of the tuple there exist.)
      
      With all this in place, contention related to tuples being checked by
      foreign key rules should be much reduced.
      
      As a bonus, the old behavior that a subtransaction grabbing a stronger
      tuple lock than the parent (sub)transaction held on a given tuple and
      later aborting caused the weaker lock to be lost, has been fixed.
      
      Many new spec files were added for isolation tester framework, to ensure
      overall behavior is sane.  There's probably room for several more tests.
      
      There were several reviewers of this patch; in particular, Noah Misch
      and Andres Freund spent considerable time in it.  Original idea for the
      patch came from Simon Riggs, after a problem report by Joel Jacobson.
      Most code is from me, with contributions from Marti Raudsepp, Alexander
      Shulgin, Noah Misch and Andres Freund.
      
      This patch was discussed in several pgsql-hackers threads; the most
      important start at the following message-ids:
      	AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com
      	1290721684-sup-3951@alvh.no-ip.org
      	1294953201-sup-2099@alvh.no-ip.org
      	1320343602-sup-2290@alvh.no-ip.org
      	1339690386-sup-8927@alvh.no-ip.org
      	4FE5FF020200002500048A3D@gw.wicourts.gov
      	4FEAB90A0200002500048B7D@gw.wicourts.gov
      0ac5ad51
  26. 19 1月, 2013 1 次提交
    • R
      Unbreak lock conflict detection for Hot Standby. · d8c38966
      Robert Haas 提交于
      This got broken in the original fast-path locking patch, because
      I failed to account for the fact that Hot Standby startup process
      might take a strong relation lock on a relation in a database to
      which it is not bound, and confused MyDatabaseId with the database
      ID of the relation being locked.
      
      Report and diagnosis by Andres Freund.  Final form of patch by me.
      d8c38966
  27. 14 1月, 2013 1 次提交
    • T
      Prevent very-low-probability PANIC during PREPARE TRANSACTION. · 2065dd28
      Tom Lane 提交于
      The code in PostPrepare_Locks supposed that it could reassign locks to
      the prepared transaction's dummy PGPROC by deleting the PROCLOCK table
      entries and immediately creating new ones.  This was safe when that code
      was written, but since we invented partitioning of the shared lock table,
      it's not safe --- another process could steal away the PROCLOCK entry in
      the short interval when it's on the freelist.  Then, if we were otherwise
      out of shared memory, PostPrepare_Locks would have to PANIC, since it's
      too late to back out of the PREPARE at that point.
      
      Fix by inventing a dynahash.c function to atomically update a hashtable
      entry's key.  (This might possibly have other uses in future.)
      
      This is an ancient bug that in principle we ought to back-patch, but the
      odds of someone hitting it in the field seem really tiny, because (a) the
      risk window is small, and (b) nobody runs servers with maxed-out lock
      tables for long, because they'll be getting non-PANIC out-of-memory errors
      anyway.  So fixing it in HEAD seems sufficient, at least until the new
      code has gotten some testing.
      2065dd28
  28. 09 1月, 2013 1 次提交
    • T
      Fix potential corruption of lock table in CREATE/DROP INDEX CONCURRENTLY. · c00dc337
      Tom Lane 提交于
      If VirtualXactLock() has to wait for a transaction that holds its VXID lock
      as a fast-path lock, it must first convert the fast-path lock to a regular
      lock.  It failed to take the required "partition" lock on the main
      shared-memory lock table while doing so.  This is the direct cause of the
      assert failure in GetLockStatusData() recently observed in the buildfarm,
      but more worryingly it could result in arbitrary corruption of the shared
      lock table if some other process were concurrently engaged in modifying the
      same partition of the lock table.  Fortunately, VirtualXactLock() is only
      used by CREATE INDEX CONCURRENTLY and DROP INDEX CONCURRENTLY, so the
      opportunities for failure are fewer than they might have been.
      
      In passing, improve some comments and be a bit more consistent about
      order of operations.
      c00dc337
  29. 02 1月, 2013 1 次提交
  30. 12 12月, 2012 1 次提交
    • K
      Fix performance problems with autovacuum truncation in busy workloads. · b19e4250
      Kevin Grittner 提交于
      In situations where there are over 8MB of empty pages at the end of
      a table, the truncation work for trailing empty pages takes longer
      than deadlock_timeout, and there is frequent access to the table by
      processes other than autovacuum, there was a problem with the
      autovacuum worker process being canceled by the deadlock checking
      code. The truncation work done by autovacuum up that point was
      lost, and the attempt tried again by a later autovacuum worker. The
      attempts could continue indefinitely without making progress,
      consuming resources and blocking other processes for up to
      deadlock_timeout each time.
      
      This patch has the autovacuum worker checking whether it is
      blocking any other thread at 20ms intervals. If such a condition
      develops, the autovacuum worker will persist the work it has done
      so far, release its lock on the table, and sleep in 50ms intervals
      for up to 5 seconds, hoping to be able to re-acquire the lock and
      try again. If it is unable to get the lock in that time, it moves
      on and a worker will try to continue later from the point this one
      left off.
      
      While this patch doesn't change the rules about when and what to
      truncate, it does cause the truncation to occur sooner, with less
      blocking, and with the consumption of fewer resources when there is
      contention for the table's lock.
      
      The only user-visible change other than improved performance is
      that the table size during truncation may change incrementally
      instead of just once.
      
      This problem exists in all supported versions but is infrequently
      reported, although some reports of performance problems when
      autovacuum runs might be caused by this. Initial commit is just the
      master branch, but this should probably be backpatched once the
      build farm and general developer usage confirm that there are no
      surprising effects.
      
      Jan Wieck
      b19e4250
  31. 30 11月, 2012 1 次提交
  32. 29 8月, 2012 1 次提交
    • A
      Split resowner.h · 45326c5a
      Alvaro Herrera 提交于
      This lets files that are mere users of ResourceOwner not automatically
      include the headers for stuff that is managed by the resowner mechanism.
      45326c5a
  33. 26 6月, 2012 1 次提交
  34. 21 6月, 2012 1 次提交
    • H
      Add a small cache of locks owned by a resource owner in ResourceOwner. · eeb6f37d
      Heikki Linnakangas 提交于
      This speeds up reassigning locks to the parent owner, when the transaction
      holds a lot of locks, but only a few of them belong to the current resource
      owner. This is particularly helps pg_dump when dumping a large number of
      objects.
      
      The cache can hold up to 15 locks in each resource owner. After that, the
      cache is marked as overflowed, and we fall back to the old method of
      scanning the whole local lock table. The tradeoff here is that the cache has
      to be scanned whenever a lock is released, so if the cache is too large,
      lock release becomes more expensive. 15 seems enough to cover pg_dump, and
      doesn't have much impact on lock release.
      
      Jeff Janes, reviewed by Amit Kapila and Heikki Linnakangas.
      eeb6f37d
  35. 11 6月, 2012 1 次提交
  36. 31 5月, 2012 1 次提交
    • R
      Fix two more bugs in fast-path relation locking. · 07ab1383
      Robert Haas 提交于
      First, the previous code failed to account for the fact that, during Hot
      Standby operation, the startup process takes AccessExclusiveLocks on
      relations without setting MyDatabaseId.  This resulted in fast path
      strong lock counts failing to be incremented with the startup process
      took locks, which in turn allowed conflicting lock requests to succeed
      when they should not have.  Report by Erik Rijkers, diagnosis by Heikki
      Linnakangas.
      
      Second, LockReleaseAll() failed to honor the allLocks and lockmethodid
      restrictions with respect to fast-path locks.  It's not clear to me
      whether this produces any user-visible breakage at the moment, but it's
      certainly wrong.  Rearrange order of operations in LockReleaseAll to fix.
      Noted by Tom Lane.
      07ab1383
  37. 05 5月, 2012 1 次提交
    • T
      Overdue code review for transaction-level advisory locks patch. · 71b9549d
      Tom Lane 提交于
      Commit 62c7bd31 had assorted problems, most
      visibly that it broke PREPARE TRANSACTION in the presence of session-level
      advisory locks (which should be ignored by PREPARE), as per a recent
      complaint from Stephen Rees.  More abstractly, the patch made the
      LockMethodData.transactional flag not merely useless but outright
      dangerous, because in point of fact that flag no longer tells you anything
      at all about whether a lock is held transactionally.  This fix therefore
      removes that flag altogether.  We now rely entirely on the convention
      already in use in lock.c that transactional lock holds must be owned by
      some ResourceOwner, while session holds are never so owned.  Setting the
      locallock struct's owner link to NULL thus denotes a session hold, and
      there is no redundant marker for that.
      
      PREPARE TRANSACTION now works again when there are session-level advisory
      locks, and it is also able to transfer transactional advisory locks to the
      prepared transaction, but for implementation reasons it throws an error if
      we hold both types of lock on a single lockable object.  Perhaps it will be
      worth improving that someday.
      
      Assorted other minor cleanup and documentation editing, as well.
      
      Back-patch to 9.1, except that in the 9.1 branch I did not remove the
      LockMethodData.transactional flag for fear of causing an ABI break for
      any external code that might be examining those structs.
      71b9549d
  38. 18 4月, 2012 2 次提交
  39. 24 1月, 2012 1 次提交
    • S
      Resolve timing issue with logging locks for Hot Standby. · c172b7b0
      Simon Riggs 提交于
      We log AccessExclusiveLocks for replay onto standby nodes,
      but because of timing issues on ProcArray it is possible to
      log a lock that is still held by a just committed transaction
      that is very soon to be removed. To avoid any timing issue we
      avoid applying locks made by transactions with InvalidXid.
      
      Simon Riggs, bug report Tom Lane, diagnosis Pavan Deolasee
      c172b7b0