提交 · dd16f9480ac67ab0c6b0102d110cd5121ed9ab46 · Greenplum / Gpdb

25 6月, 2012 1 次提交

Replace XLogRecPtr struct with a 64-bit integer. · 0ab9d1c4

由 Heikki Linnakangas 提交于 6月 24, 2012

This simplifies code that needs to do arithmetic on XLogRecPtrs.

To avoid changing on-disk format of data pages, the LSN on data pages is
still stored in the old format. That should keep pg_upgrade happy. However,
we have XLogRecPtrs embedded in the control file, and in the structs that
are sent over the replication protocol, so this changes breaks compatibility
of pg_basebackup and server. I didn't do anything about this in this patch,
per discussion on -hackers, the right thing to do would to be to change the
replication protocol to be architecture-independent, so that you could use
a newer version of pg_receivexlog, for example, against an older server
version.

0ab9d1c4

11 6月, 2012 1 次提交
- B
  Run pgindent on 9.2 source tree in preparation for first 9.3 · 927d61ee
  由 Bruce Momjian 提交于 6月 10, 2012
```
commit-fest.
```
  927d61ee
14 5月, 2012 1 次提交

Update comments that became out-of-date with the PGXACT struct. · 9e4637bf

由 Heikki Linnakangas 提交于 5月 14, 2012

When the "hot" members of PGPROC were split off to separate PGXACT structs,
many PGPROC fields referred to in comments were moved to PGXACT, but the
comments were neglected in the commit. Mostly this is just a search/replace
of PGPROC with PGXACT, but the way the dummy PGPROC entries are created for
prepared transactions changed more, making some of the comments totally
bogus.

Noah Misch

9e4637bf

10 5月, 2012 1 次提交

Improve control logic for bgwriter hibernation mode. · 6308ba05

由 Tom Lane 提交于 5月 09, 2012

Commit 6d90eaaa added a hibernation mode
to the bgwriter to reduce the server's idle-power consumption. However,
its interaction with the detailed behavior of BgBufferSync's feedback
control loop wasn't very well thought out. That control loop depends
primarily on the rate of buffer allocation, not the rate of buffer
dirtying, so the hibernation mode has to be designed to operate only when
no new buffer allocations are happening. Also, the check for whether the
system is effectively idle was not quite right and would fail to detect
a constant low level of activity, thus allowing the bgwriter to go into
hibernation mode in a way that would let the cycle time vary quite a bit,
possibly further confusing the feedback loop. To fix, move the wakeup
support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into
StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode
unless no buffer allocations have happened recently.

In addition, fix the delaying logic to remove the problem of possibly not
responding to signals promptly, which was basically caused by trying to use
the process latch's is_set flag for multiple purposes. I can't prove it
but I'm suspicious that that hack was responsible for the intermittent
"postmaster does not shut down" failures we've been seeing in the buildfarm
lately. In any case it did nothing to improve the readability or
robustness of the code.

In passing, express the hibernation sleep time as a multiplier on
BgWriterDelay, not a constant. I'm not sure whether there's any value in
exposing the longer sleep time as an independently configurable setting,
but we can at least make it act like this for little extra code.

6308ba05

09 5月, 2012 1 次提交

Reduce idle power consumption of walwriter and checkpointer processes. · 5461564a

由 Tom Lane 提交于 5月 08, 2012

This patch modifies the walwriter process so that, when it has not found
anything useful to do for many consecutive wakeup cycles, it extends its
sleep time to reduce the server's idle power consumption. It reverts to
normal as soon as it's done any successful flushes. It's still true that
during any async commit, backends check for completed, unflushed pages of
WAL and signal the walwriter if there are any; so that in practice the
walwriter can get awakened and returned to normal operation sooner than the
sleep time might suggest.

Also, improve the checkpointer so that it uses a latch and a computed delay
time to not wake up at all except when it has something to do, replacing a
previous hardcoded 0.5 sec wakeup cycle. This also is primarily useful for
reducing the server's power consumption when idle.

In passing, get rid of the dedicated latch for signaling the walwriter in
favor of using its procLatch, since that comports better with possible
generic signal handlers using that latch. Also, fix a pre-existing bug
with failure to save/restore errno in walwriter's signal handlers.

Peter Geoghegan, somewhat simplified by Tom

5461564a

05 5月, 2012 1 次提交

Overdue code review for transaction-level advisory locks patch. · 71b9549d

由 Tom Lane 提交于 5月 04, 2012

Commit 62c7bd31 had assorted problems, most
visibly that it broke PREPARE TRANSACTION in the presence of session-level
advisory locks (which should be ignored by PREPARE), as per a recent
complaint from Stephen Rees. More abstractly, the patch made the
LockMethodData.transactional flag not merely useless but outright
dangerous, because in point of fact that flag no longer tells you anything
at all about whether a lock is held transactionally. This fix therefore
removes that flag altogether. We now rely entirely on the convention
already in use in lock.c that transactional lock holds must be owned by
some ResourceOwner, while session holds are never so owned. Setting the
locallock struct's owner link to NULL thus denotes a session hold, and
there is no redundant marker for that.

PREPARE TRANSACTION now works again when there are session-level advisory
locks, and it is also able to transfer transactional advisory locks to the
prepared transaction, but for implementation reasons it throws an error if
we hold both types of lock on a single lockable object. Perhaps it will be
worth improving that someday.

Assorted other minor cleanup and documentation editing, as well.

Back-patch to 9.1, except that in the 9.1 branch I did not remove the
LockMethodData.transactional flag for fear of causing an ABI break for
any external code that might be examining those structs.

71b9549d

18 4月, 2012 1 次提交

Tighten up error recovery for fast-path locking. · 53c5b869

由 Robert Haas 提交于 4月 18, 2012

The previous code could cause a backend crash after BEGIN; SAVEPOINT a;
LOCK TABLE foo (interrupted by ^C or statement timeout); ROLLBACK TO
SAVEPOINT a; LOCK TABLE foo, and might have leaked strong-lock counts
in other situations.

Report by Zoltán Böszörményi; patch review by Jeff Davis.

53c5b869

22 3月, 2012 1 次提交

Clean up compiler warnings from unused variables with asserts disabled · 0e85abd6

由 Peter Eisentraut 提交于 3月 21, 2012

For those variables only used when asserts are enabled, use a new
macro PG_USED_FOR_ASSERTS_ONLY, which expands to
__attribute__((unused)) when asserts are not enabled.

0e85abd6

30 1月, 2012 1 次提交

Make group commit more effective. · 9b38d46d

由 Heikki Linnakangas 提交于 1月 30, 2012

When a backend needs to flush the WAL, and someone else is already flushing
the WAL, wait until it releases the WALInsertLock and check if we still need
to do the flush or if the other backend already did the work for us, before
acquiring WALInsertLock. This helps group commit, because when the WAL flush
finishes, all the backends that were waiting for it can be woken up in one
go, and the can all concurrently observe that they're done, rather than
waking them up one by one in a cascading fashion.

This is based on a new LWLock function, LWLockWaitUntilFree(), which has
peculiar semantics. If the lock is immediately free, it grabs the lock and
returns true. If it's not free, it waits until it is released, but then
returns false without grabbing the lock. This is used in XLogFlush(), so
that when the lock is acquired, the backend flushes the WAL, but if it's
not, the backend first checks the current flush location before retrying.

Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although
this patch as committed ended up being very different from that.

9b38d46d

28 1月, 2012 1 次提交
- H
  Initialize the new bgwriterLatch field properly. · cf3fff63
  由 Heikki Linnakangas 提交于 1月 27, 2012
```
Peter Geoghegan
```
  cf3fff63
02 1月, 2012 1 次提交
- B
  
  Update copyright notices for year 2012. · e126958c
  由 Bruce Momjian 提交于 1月 01, 2012
  
  e126958c
25 11月, 2011 1 次提交

Move "hot" members of PGPROC into a separate PGXACT array. · ed0b409d

由 Robert Haas 提交于 11月 25, 2011

This speeds up snapshot-taking and reduces ProcArrayLock contention.
Also, the PGPROC (and PGXACT) structures used by two-phase commit are
now allocated as part of the main array, rather than in a separate
array, and we keep ProcArray sorted in pointer order. These changes
are intended to minimize the number of cache lines that must be pulled
in to take a snapshot, and testing shows a substantial increase in
performance on both read and write workloads at high concurrencies.

Pavan Deolasee, Heikki Linnakangas, Robert Haas

ed0b409d

02 11月, 2011 1 次提交

Initialize myProcLocks queues just once, at postmaster startup. · c2891b46

由 Robert Haas 提交于 11月 01, 2011

In assert-enabled builds, we assert during the shutdown sequence that
the queues have been properly emptied, and during process startup that
we are inheriting empty queues.  In non-assert enabled builds, we just
save a few cycles.

c2891b46

10 9月, 2011 1 次提交

Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h. · a7801b62

由 Tom Lane 提交于 9月 09, 2011

As per my recent proposal, this refactors things so that these typedefs and
macros are available in a header that can be included in frontend-ish code.
I also changed various headers that were undesirably including
utils/timestamp.h to include datatype/timestamp.h instead.  Unsurprisingly,
this showed that half the system was getting utils/timestamp.h by way of
xlog.h.

No actual code changes here, just header refactoring.

a7801b62

11 8月, 2011 1 次提交

Change the autovacuum launcher to use WaitLatch instead of a poll loop. · 4dab3d5a

由 Tom Lane 提交于 8月 10, 2011

In pursuit of this (and with the expectation that WaitLatch will be needed
in more places), convert the latch field that was already added to PGPROC
for sync rep into a generic latch that is activated for all PGPROC-owning
processes, and change many of the standard backend signal handlers to set
that latch when a signal happens. This will allow WaitLatch callers to be
wakened properly by these signals.

In passing, fix a whole bunch of signal handlers that had been hacked to do
things that might change errno, without adding the necessary save/restore
logic for errno. Also make some minor fixes in unix_latch.c, and clean
up bizarre and unsafe scheme for disowning the process's latch. Much of
this has to be back-patched into 9.1.

Peter Geoghegan, with additional work by Tom

4dab3d5a

10 8月, 2011 1 次提交

Documentation improvement and minor code cleanups for the latch facility. · 4e15a4db

由 Tom Lane 提交于 8月 09, 2011

Improve the documentation around weak-memory-ordering risks, and do a pass
of general editorialization on the comments in the latch code. Make the
Windows latch code more like the Unix latch code where feasible; in
particular provide the same Assert checks in both implementations.
Fix poorly-placed WaitLatch call in syncrep.c.

This patch resolves, for the moment, concerns around weak-memory-ordering
bugs in latch-related code: we have documented the restrictions and checked
that existing calls meet them. In 9.2 I hope that we will install suitable
memory barrier instructions in SetLatch/ResetLatch, so that their callers
don't need to be quite so careful.

4e15a4db

03 8月, 2011 2 次提交

Move CheckRecoveryConflictDeadlock() call to a safer place. · ac36e6f7

由 Tom Lane 提交于 8月 02, 2011

This kluge was inserted in a spot apparently chosen at random: the lock
manager's state is not yet fully set up for the wait, and in particular
LockWaitCancel hasn't been armed by setting lockAwaited, so the ProcLock
will not get cleaned up if the ereport is thrown.  This seems to not cause
any observable problem in trivial test cases, because LockReleaseAll will
silently clean up the debris; but I was able to cause failures with tests
involving subtransactions.

Fixes breakage induced by commit c85c9414.
Back-patch to all affected branches.

ac36e6f7

Fix incorrect initialization of ProcGlobal->startupBufferPinWaitBufId. · 2e53bd55

由 Tom Lane 提交于 8月 02, 2011

It was initialized in the wrong place and to the wrong value.  With bad
luck this could result in incorrect query-cancellation failures in hot
standby sessions, should a HS backend be holding pin on buffer number 1
while trying to acquire a lock.

2e53bd55

18 7月, 2011 1 次提交

Create a "fast path" for acquiring weak relation locks. · 3cba8999

由 Robert Haas 提交于 5月 28, 2011

When an AccessShareLock, RowShareLock, or RowExclusiveLock is requested
on an unshared database relation, and we can verify that no conflicting
locks can possibly be present, record the lock in a per-backend queue,
stored within the PGPROC, rather than in the primary lock table.  This
eliminates a great deal of contention on the lock manager LWLocks.

This patch also refactors the interface between GetLockStatusData() and
pg_lock_status() to be a bit more abstract, so that we don't rely so
heavily on the lock manager's internal representation details.  The new
fast path lock structures don't have a LOCK or PROCLOCK structure to
return, so we mustn't depend on that for purposes of listing outstanding
locks.

Review by Jeff Davis.

3cba8999

29 6月, 2011 1 次提交
- P
  Unify spelling of "canceled", "canceling", "cancellation" · 21f1e15a
  由 Peter Eisentraut 提交于 6月 29, 2011
```
We had previously (af26857a)
established the U.S. spellings as standard.
```
  21f1e15a
19 6月, 2011 1 次提交
- P
  
  Capitalization fixes · 8a8fbe7e
  由 Peter Eisentraut 提交于 6月 19, 2011
  
  8a8fbe7e
17 6月, 2011 1 次提交

Fix minor thinko in ProcGlobalShmemSize(). · c573486c

由 Robert Haas 提交于 6月 17, 2011

There's no need to add space for startupBufferPinWaitBufId, because
it's part of the PROC_HDR object for which this function already
allocates space.

This has been wrong for a while, but the only consequence is that our
shared memory allocation is increased by 4 bytes, so no back-patch.

c573486c

12 6月, 2011 1 次提交

Code cleanup for InitProcGlobal. · 47ebcecc

由 Robert Haas 提交于 6月 12, 2011

The old code creates three separate arrays when only one is needed,
using two different shmem allocation functions for no obvious reason.
It also strangely splits up the initialization of AuxilaryProcs
between the top and bottom of the function to no evident purpose.

Review by Tom Lane.

47ebcecc

07 3月, 2011 1 次提交

Efficient transaction-controlled synchronous replication. · a8a8a3e0

由 Simon Riggs 提交于 3月 06, 2011

If a standby is broadcasting reply messages and we have named
one or more standbys in synchronous_standby_names then allow
users who set synchronous_replication to wait for commit, which
then provides strict data integrity guarantees. Design avoids
sending and receiving transaction state information so minimises
bookkeeping overheads. We synchronize with the highest priority
standby that is connected and ready to synchronize. Other standbys
can be defined to takeover in case of standby failure.

This version has very strict behaviour; more relaxed options
may be added at a later date.

Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime
Casanova, Heikki Linnakangas and Robert Haas, plus the assistance
of many other design reviewers.

a8a8a3e0

18 2月, 2011 1 次提交

Add transaction-level advisory locks. · 62c7bd31

由 Itagaki Takahiro 提交于 2月 18, 2011

They share the same locking namespace with the existing session-level
advisory locks, but they are automatically released at the end of the
current transaction and cannot be released explicitly via unlock
functions.

Marko Tiikkaja, reviewed by me.

62c7bd31

02 1月, 2011 1 次提交
- B
  
  Stamp copyrights for year 2011. · 5d950e3b
  由 Bruce Momjian 提交于 1月 01, 2011
  
  5d950e3b
21 9月, 2010 1 次提交
- M
  
  Remove cvs keywords from all files. · 9f2e2113
  由 Magnus Hagander 提交于 9月 20, 2010
  
  9f2e2113
24 8月, 2010 1 次提交

Marginal code cleanup for streaming replication. · b9defe04

由 Tom Lane 提交于 8月 23, 2010

There is no reason that proc.c should have to get involved in this dirty hack
for letting the postmaster know which children are walsenders. Revert that
file to the way it was, and confine the kluge to pmsignal.c and postmaster.c.

b9defe04

07 7月, 2010 1 次提交
- B
  
  pgindent run for 9.0, second run · 239d769e
  由 Bruce Momjian 提交于 7月 06, 2010
  
  239d769e
04 7月, 2010 1 次提交

Replace max_standby_delay with two parameters, max_standby_archive_delay and · e76c1a0f

由 Tom Lane 提交于 7月 03, 2010

max_standby_streaming_delay, and revise the implementation to avoid assuming
that timestamps found in WAL records can meaningfully be compared to clock
time on the standby server. Instead, the delay limits are compared to the
elapsed time since we last obtained a new WAL segment from archive or since
we were last "caught up" to WAL data arriving via streaming replication.
This avoids problems with clock skew between primary and standby, as well
as other corner cases that the original coding would misbehave in, such
as the primary server having significant idle time between transactions.
Per my complaint some time ago and considerable ensuing discussion.

Do some desultory editing on the hot standby documentation, too.

e76c1a0f

27 5月, 2010 1 次提交

HS Defer buffer pin deadlock check until deadlock_timeout has expired. · f9dbac94

由 Simon Riggs 提交于 5月 26, 2010

During Hot Standby we need to check for buffer pin deadlocks when the
Startup process begins to wait, in case it never wakes up again. We
previously made the deadlock check immediately on the basis it was
cheap, though clearer thinking and prima facie evidence shows that
was too simple. Refactor existing code to make it easy to add in
deferral of deadlock check until deadlock_timeout allowing a good
reduction in deadlock checks since far few buffer pins are held for
that duration. It's worth doing anyway, though major goal is to
prevent further reports of context switching with high numbers of
users on occasional tests.

f9dbac94

29 4月, 2010 1 次提交

Modify ShmemInitStruct and ShmemInitHash to throw errors internally, · 77acab75

由 Tom Lane 提交于 4月 28, 2010

rather than returning NULL for some-but-not-all failures as they used to.
Remove now-redundant tests for NULL from call sites.

We had to do something about this because many call sites were failing to
check for NULL; and changing it like this seems a lot more useful and
mistake-proof than adding checks to the call sites without them.

77acab75

26 2月, 2010 1 次提交
- B
  
  pgindent run for 9.0 · 65e806cb
  由 Bruce Momjian 提交于 2月 26, 2010
  
  65e806cb
13 2月, 2010 1 次提交

Re-enable max_standby_delay = -1 using deadlock detection on startup · b95a720a

由 Simon Riggs 提交于 2月 13, 2010

process. If startup waits on a buffer pin we send a request to all
backends to cancel themselves if they are holding the buffer pin
required and they are also waiting on a lock. If not, startup waits
until max_standby_delay before cancelling any backend waiting for
the requested buffer pin.

b95a720a

08 2月, 2010 1 次提交

Remove old-style VACUUM FULL (which was known for a little while as · 0a469c87

由 Tom Lane 提交于 2月 08, 2010

VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
Per discussion, the use case for this method of vacuuming is no longer large
enough to justify maintaining it; not to mention that we don't wish to invest
the work that would be needed to make it play nicely with Hot Standby.

Aside from the code directly related to old-style VACUUM FULL, this commit
removes support for certain WAL record types that could only be generated
within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
nontransactional generation of cache invalidation sinval messages (the last
being the sticking point for Hot Standby).

We still have to retain all code that copes with finding HEAP_MOVED_OFF and
HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long
as we want to support in-place update from pre-9.0 databases.

0a469c87

24 1月, 2010 1 次提交

In HS, Startup process sets SIGALRM when waiting for buffer pin. If · 959ac58c

由 Simon Riggs 提交于 1月 23, 2010

woken by alarm we send SIGUSR1 to all backends requesting that they
check to see if they are blocking Startup process. If so, they throw
ERROR/FATAL as for other conflict resolutions. Deadlock stop gap
removed. max_standby_delay = -1 option removed to prevent deadlock.

959ac58c

16 1月, 2010 1 次提交

Teach standby conflict resolution to use SIGUSR1 · a8ce974c

由 Simon Riggs 提交于 1月 16, 2010

Conflict reason is passed through directly to the backend, so we can
take decisions about the effect of the conflict based upon the local
state. No specific changes, as yet, though this prepares for later work.
CancelVirtualTransaction() sends signals while holding ProcArrayLock.
Introduce errdetail_abort() to give message detail explaining that the
abort was caused by conflict processing. Remove CONFLICT_MODE states
in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.

a8ce974c

15 1月, 2010 1 次提交

Introduce Streaming Replication. · 40f908bd

由 Heikki Linnakangas 提交于 1月 15, 2010

This includes two new kinds of postmaster processes, walsenders and
walreceiver. Walreceiver is responsible for connecting to the primary server
and streaming WAL to disk, while walsender runs in the primary server and
streams WAL from disk to the client.

Documentation still needs work, but the basics are there. We will probably
pull the replication section to a new chapter later on, as well as the
sections describing file-based replication. But let's do that as a separate
patch, so that it's easier to see what has been added/changed. This patch
also adds a new section to the chapter about FE/BE protocol, documenting the
protocol used by walsender/walreceivxer.

Bump catalog version because of two new functions,
pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
monitoring the progress of replication.

Fujii Masao, with additional hacking by me

40f908bd

03 1月, 2010 1 次提交
- B
  
  Update copyright for the year 2010. · 02398008
  由 Bruce Momjian 提交于 1月 02, 2010
  
  02398008
19 12月, 2009 1 次提交

Allow read only connections during recovery, known as Hot Standby. · efc16ea5

由 Simon Riggs 提交于 12月 19, 2009

Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.

New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.

This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.

Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.

Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.

efc16ea5