提交 · 44ede1072dfcdd4291225b55439a549c8a6cc2b7 · Greenplum / Gpdb

16 11月, 2007 4 次提交

B
Re-run pgindent with updated list of typedefs. (Updated README should · f6e8730d
由 Bruce Momjian 提交于 11月 15, 2007
```
avoid this problem in the future.)
```
f6e8730d
P
When logging the recovery.conf parameters, show them quoted as they would · b30769ee
由 Peter Eisentraut 提交于 11月 15, 2007
```
appear in the configuration file.
```
b30769ee
B

pgindent run for 8.3. · fdf5a5ef
由 Bruce Momjian 提交于 11月 15, 2007

fdf5a5ef

Prevent re-use of a deleted relation's relfilenode until after the next · 6cc4451b

由 Tom Lane 提交于 11月 15, 2007

checkpoint. This guards against an unlikely data-loss scenario in which
we re-use the relfilenode, then crash, then replay the deletion and
recreation of the file. Even then we'd be OK if all insertions into the
new relation had been WAL-logged ... but that's not guaranteed given all
the no-WAL-logging optimizations that have recently been added.

Patch by Heikki Linnakangas, per a discussion last month.

6cc4451b

13 10月, 2007 1 次提交

When telling the bgwriter that we need a checkpoint because too much xlog · 5c8eb929

由 Tom Lane 提交于 10月 12, 2007

has been consumed, recheck against the latest value of RedoRecPtr before
really sending the signal. This avoids useless checkpoint activity if
XLogWrite is executed when we have a very stale local copy of RedoRecPtr.
The potential for useless checkpoint is very much worse in 8.3 because of
the walwriter process (which never does XLogInsert), so while this behavior
was intentional, it needs to be changed. Per report from Itagaki Takahiro.

5c8eb929

01 10月, 2007 1 次提交

Adjust recovery PS display as agreed with Simon: 'waiting for XXX' · ab051bd2

由 Tom Lane 提交于 9月 30, 2007

while the restore_command does its thing, then 'recovering XXX' while
processing the segment file. These operations are heavyweight enough
that an extra PS display set shouldn't bother anyone.

ab051bd2

30 9月, 2007 1 次提交
- T
  Make recovery show the current input WAL segment name in the startup · 77ccbe64
  由 Tom Lane 提交于 9月 29, 2007
```
process' PS display.  After a suggestion by Simon (not exactly his
patch though).
```
  77ccbe64
29 9月, 2007 1 次提交

Make archive recovery always start a new timeline, rather than only when a · b46bd55a

由 Tom Lane 提交于 9月 29, 2007

recovery stop time was used. This avoids a corner-case risk of trying to
overwrite an existing archived copy of the last WAL segment, and seems
simpler and cleaner all around than the original definition. Per example
from Jon Colverson and subsequent analysis by Simon.

b46bd55a

27 9月, 2007 1 次提交

Minor improvements in backup and recovery: · f18dfc48

由 Tom Lane 提交于 9月 26, 2007

- create a separate archive_mode GUC, on which archive_command is dependent

- %r option in recovery.conf sends last restartpoint to recovery command

- %r used in pg_standby, updated README

- minor other code cleanup in pg_standby

- doc on Warm Standby now mentions pg_standby and %r

- log_restartpoints recovery option emits LOG message at each restartpoint

- end of recovery now displays last transaction end time, as requested
  by Warren Little; also shown at each restartpoint

- restart archiver if needed to carry away WAL files at shutdown

Simon Riggs

f18dfc48

09 9月, 2007 1 次提交

Replace the former method of determining snapshot xmax --- to wit, calling · 6bd4f401

由 Tom Lane 提交于 9月 08, 2007

ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid"
variable that is updated during transaction commit or abort. Since
latestCompletedXid is written only in places that had to lock ProcArrayLock
exclusively anyway, and is read only in places that had to lock ProcArrayLock
shared anyway, it adds no new locking requirements to the system despite being
cluster-wide. Moreover, removing ReadNewTransactionId from snapshot
acquisition eliminates the need to take both XidGenLock and ProcArrayLock at
the same time. Since XidGenLock is sometimes held across I/O this can be a
significant win. Some preliminary benchmarking suggested that this patch has
no effect on average throughput but can significantly improve the worst-case
transaction times seen in pgbench. Concept by Florian Pflug, implementation
by Tom Lane.

6bd4f401

06 9月, 2007 1 次提交

Implement lazy XID allocation: transactions that do not modify any database · 295e6398

由 Tom Lane 提交于 9月 05, 2007

rows will normally never obtain an XID at all. We already did things this way
for subtransactions, but this patch extends the concept to top-level
transactions. In applications where there are lots of short read-only
transactions, this should improve performance noticeably; not so much from
removal of the actual XID-assignments, as from reduction of overhead that's
driven by the rate of XID consumption. We add a concept of a "virtual
transaction ID" so that active transactions can be uniquely identified even
if they don't have a regular XID. This is a much lighter-weight concept:
uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
record is made about them.

Florian Pflug, with some editorialization by Tom.

295e6398

29 8月, 2007 1 次提交
- T
  Add a debug logging message when a resource manager rejects an attempted · a52e4408
  由 Tom Lane 提交于 8月 28, 2007
```
restart point.  Per suggestion from Simon Riggs.
```
  a52e4408
14 8月, 2007 1 次提交

Fix two bugs induced in VACUUM FULL by async-commit patch. · 647fd9a1

由 Tom Lane 提交于 8月 13, 2007

First, we cannot assume that XLogAsyncCommitFlush guarantees hint bits will be
settable, because clog.c's inexact LSN bookkeeping results in windows where a
previously flushed transaction is considered unhintable because it shares an
LSN slot with a later unflushed transaction. But repair_frag requires
XMIN_COMMITTED to be correct so that it can distinguish tuples moved by the
current vacuum. Since not being able to set the bit is an uncommon corner
case, the most practical way of dealing with it seems to be to abandon
shrinking (ie, don't invoke repair_frag) when we find a non-dead tuple whose
XMIN_COMMITTED bit couldn't be set.

Second, it is possible for the same reason that a RECENTLY_DEAD tuple does not
get its XMAX_COMMITTED bit set during scan_heap. But by the time repair_frag
examines the tuple it might be possible to set the bit. We therefore must
take buffer content lock when calling HeapTupleSatisfiesVacuum a second time,
else we can get an Assert failure in SetBufferCommitInfoNeedsSave. This
latter bug is latent in existing releases, but I think it cannot actually
occur without async commit, since the first HeapTupleSatisfiesVacuum call
should always have set the bit. So I'm not going to back-patch it.

In passing, reduce the existing "cannot shrink relation" messages from NOTICE
to LOG level. The new message must be no higher than LOG if we don't want
unpredictable regression test failures, and consistency seems like a good
idea. Also arrange that only one such message is reported per VACUUM FULL;
in typical scenarios you could get spammed with many such messages, which
seems a bit useless.

647fd9a1

04 8月, 2007 1 次提交

Switch over to using the src/timezone functions for formatting timestamps · bdd6b622

由 Tom Lane 提交于 8月 04, 2007

displayed in the postmaster log. This avoids Windows-specific problems with
localized time zone names that are in the wrong encoding, and generally seems
like a good idea to forestall other potential platform-dependent issues.
To preserve the existing behavior that all backends will log in the same time
zone, create a new GUC variable log_timezone that can only be changed on a
system-wide basis, and reference log-related calculations to that zone instead
of the TimeZone variable.

This fixes the issue reported by Hiroshi Saito that timestamps printed by
xlog.c startup could be improperly localized on Windows. We still need a
simpler patch for that problem in the back branches, however.

bdd6b622

02 8月, 2007 1 次提交

Support an optional asynchronous commit mode, in which we don't flush WAL · 4a78cdeb

由 Tom Lane 提交于 8月 01, 2007

before reporting a transaction committed. Data consistency is still
guaranteed (unlike setting fsync = off), but a crash may lose the effects
of the last few transactions. Patch by Simon, some editorialization by Tom.

4a78cdeb

24 7月, 2007 1 次提交

Create a new dedicated Postgres process, "wal writer", which exists to write · ad429572

由 Tom Lane 提交于 7月 24, 2007

and fsync WAL at convenient intervals. For the moment it just tries to
offload this work from backends, but soon it will be responsible for
guaranteeing a maximum delay before asynchronously-committed transactions
will be flushed to disk.

This is a portion of Simon Riggs' async-commit patch, committed to CVS
separately because a background WAL writer seems like it might be a good idea
independently of the async-commit feature. I rebased walwriter.c on
bgwriter.c because it seemed like a more appropriate way of handling signals;
while the startup/shutdown logic in postmaster.c is more like autovac because
we want walwriter to quit before we start the shutdown checkpoint.

ad429572

01 7月, 2007 1 次提交
- T
  Improve logging of checkpoints. Patch by Greg Smith, worked over · 9fc25c05
  由 Tom Lane 提交于 6月 30, 2007
```
by Heikki and a little bit by me.
```
  9fc25c05
28 6月, 2007 1 次提交

Implement "distributed" checkpoints in which the checkpoint I/O is spread · 867e2c91

由 Tom Lane 提交于 6月 28, 2007

over a fairly long period of time, rather than being spat out in a burst.
This happens only for background checkpoints carried out by the bgwriter;
other cases, such as a shutdown checkpoint, are still done at full speed.

Remove the "all buffers" scan in the bgwriter, and associated stats
infrastructure, since this seems no longer very useful when the checkpoint
itself is properly throttled.

Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas,
and some minor API editorialization by me.

867e2c91

31 5月, 2007 3 次提交

P

Make some messages more consistent · 7ce9b368
由 Peter Eisentraut 提交于 5月 31, 2007

7ce9b368
P

Downgrade some low-level startup messages to DEBUG1. · 71fb7b90
由 Peter Eisentraut 提交于 5月 31, 2007

71fb7b90

Make large sequential scans and VACUUMs work in a limited-size "ring" of · d526575f

由 Tom Lane 提交于 5月 30, 2007

buffers, rather than blowing out the whole shared-buffer arena. Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer. Those flushes will now occur only once per
ring-ful. The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done. The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.

This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it. To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.

Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.

Original patch by Simon, reworked by Heikki and again by Tom.

d526575f

21 5月, 2007 1 次提交

To support external compression of archived WAL data, add a flag bit to · a8d539f1

由 Tom Lane 提交于 5月 20, 2007

WAL records that shows whether it is safe to remove full-page images
(ie, whether or not an on-line backup was in progress when the WAL entry
was made).  Also make provision for an XLOG_NOOP record type that can be
used to fill in the extra space when decompressing the data for restore.

This is the portion of Koichi Suzuki's "full page writes" patch that
has to go into the core database.  The remainder of that work is two
external compression and decompression programs, which for the time being
will undergo separate development on pgfoundry.  Per discussion.

Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be
possible to compress them (the previous coding caused essential info
to be omitted).  The other commonly-used record types seem OK already,
with the possible exception of GIN and GIST WAL records, which I don't
understand well enough to opine on.

a8d539f1

01 5月, 2007 1 次提交

Change the timestamps recorded in transaction commit/abort xlog records · c4320619

由 Tom Lane 提交于 4月 30, 2007

from time_t to TimestampTz representation. This provides full gettimeofday()
resolution of the timestamps, which might be useful when attempting to
do point-in-time recovery --- previously it was not possible to specify
the stop point with sub-second resolution. But mostly this is to get
rid of TimestampTz-to-time_t conversion overhead during commit. Per my
proposal of a day or two back.

c4320619

04 4月, 2007 1 次提交

Remove the CheckpointStartLock in favor of having backends show whether they · 9c9b6194

由 Tom Lane 提交于 4月 03, 2007

are in their commit critical sections via flags in the ProcArray. Checkpoint
can watch the ProcArray to determine when it's safe to proceed. This is
a considerably better solution to the original problem of race conditions
between checkpoint and transaction commit: it speeds up commit, since there's
one less lock to fool with, and it prevents the problem of checkpoint being
delayed indefinitely when there's a constant flow of commits. Heikki, with
some kibitzing from Tom.

9c9b6194

03 4月, 2007 1 次提交

Decouple the values of TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE. · b3005276

由 Tom Lane 提交于 4月 03, 2007

Add the latter to the values checked in pg_control, since it can't be changed
without invalidating toast table content. This commit in itself shouldn't
change any behavior, but it lays some necessary groundwork for experimentation
with these toast-control numbers.

Note: while TOAST_TUPLE_THRESHOLD can now be changed without initdb, some
thought still needs to be given to needs_toast_table() in toasting.c before
unleashing random changes.

b3005276

04 3月, 2007 1 次提交
- B
  Remove undo information from pg_controldata --- never used. · ae35867a
  由 Bruce Momjian 提交于 3月 03, 2007
```
Florian G. Pflug
```
  ae35867a
14 2月, 2007 1 次提交
- B
  Move fsync method macro defines into /include/access/xlogdefs.h so they · a9eb5396
  由 Bruce Momjian 提交于 2月 14, 2007
```
can be used by src/tools/fsync/test_fsync.c.
```
  a9eb5396
08 2月, 2007 2 次提交

Normalize fgets() calls to use sizeof() for calculating the buffer size · 086c1894

由 Peter Eisentraut 提交于 2月 08, 2007

where possible, and fix some sites that apparently thought that fgets()
will overwrite the buffer by one byte.

Also add some strlcpy() to eliminate some weird memory handling.

086c1894

Remove the xlog-centric "database system is ready" message and replace it with · 78d12161

由 Tom Lane 提交于 2月 07, 2007

"database system is ready to accept connections", which is issued by the
postmaster when it really is ready to accept connections. Per proposal from
Markus Schiltknecht and subsequent discussion.

78d12161

02 2月, 2007 1 次提交

Wording cleanup for error messages. Also change can't -> cannot. · 8b4ff8b6

由 Bruce Momjian 提交于 2月 01, 2007

Standard English uses "may", "can", and "might" in different ways:

        may - permission, "You may borrow my rake."

        can - ability, "I can lift that log."

        might - possibility, "It might rain today."

Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice.  Similarly, "It may crash" is better stated, "It might crash".

8b4ff8b6

06 1月, 2007 1 次提交
- B
  Update CVS HEAD for 2007 copyright. Back branches are typically not · 29dccf5f
  由 Bruce Momjian 提交于 1月 05, 2007
```
back-stamped for this.
```
  29dccf5f
09 12月, 2006 1 次提交

Remove the logId/logSeg fields from pg_control, because they are not needed · 0cb91ccb

由 Tom Lane 提交于 12月 08, 2006

in normal operation, and we can avoid rewriting pg_control at every log
segment switch if we don't insist that these values be valid. Reducing
the number of pg_control updates is a good idea for both performance and
reliability. It does make pg_resetxlog's life a bit harder, but that seems
a good tradeoff; and anyway the change to pg_resetxlog amounts to automating
something people formerly needed to do by hand, namely look at the existing
pg_xlog files to make sure the new WAL start point was past them.

In passing, change the wording of xlog.c's "database system was interrupted"
messages: describe the pg_control timestamp as "last known up at" rather than
implying it is the exact time of service interruption. With this change the
timestamp will generally be the time of the last checkpoint, which could be
many minutes before the failure; and we've already seen indications that
people tend to misinterpret the old wording.

initdb forced due to change in pg_control layout. Simon Riggs and Tom Lane

0cb91ccb

01 12月, 2006 1 次提交

Minor adjustments to make failures in startup/shutdown behave more cleanly. · 5f60086e

由 Tom Lane 提交于 11月 30, 2006

StartupXLOG and ShutdownXLOG no longer need to be critical sections, because
in all contexts where they are invoked, elog(ERROR) would be translated to
elog(FATAL) anyway. (One change in bgwriter.c is needed to make this true:
set ExitOnAnyError before trying to exit. This is a good fix anyway since
the existing code would have gone into an infinite loop on elog(ERROR) during
shutdown.) That avoids a misleading report of PANIC during semi-orderly
failures. Modify the postmaster to include the startup process in the set of
processes that get SIGTERM when a fast shutdown is requested, and also fix it
to not try to restart the bgwriter if the bgwriter fails while trying to write
the shutdown checkpoint. Net result is that "pg_ctl stop -m fast" does
something reasonable for a system in warm standby mode, and so should Unix
system shutdown (ie, universal SIGTERM). Per gripe from Stephen Harris and
some corner-case testing of my own.

5f60086e

22 11月, 2006 1 次提交

On systems that have setsid(2) (which should be just about everything except · 3ad0728c

由 Tom Lane 提交于 11月 21, 2006

Windows), arrange for each postmaster child process to be its own process
group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
process group not only the direct child process. This provides saner behavior
for archive and recovery scripts; in particular, it's possible to shut down a
warm-standby recovery server using "pg_ctl stop -m immediate", since delivery
of SIGQUIT to the startup subprocess will result in killing the waiting
recovery_command. Also, this makes Query Cancel and statement_timeout apply
to scripts being run from backends via system(). (There is no support in the
core backend for that, but it's widely done using untrusted PLs.) Per gripe
from Stephen Harris and subsequent discussion.

3ad0728c

16 11月, 2006 1 次提交
- P
  
  String fix · e138b809
  由 Peter Eisentraut 提交于 11月 16, 2006
  
  e138b809
11 11月, 2006 1 次提交
- T
  
  Clean up some misleading references to %p being a full path, per Simon. · 792d6edd
  由 Tom Lane 提交于 11月 10, 2006
  
  792d6edd
09 11月, 2006 1 次提交

Change Windows rename and unlink substitutes so that they time out after · dcbdf9b1

由 Tom Lane 提交于 11月 08, 2006

30 seconds instead of retrying forever. Also modify xlog.c so that if
it fails to rename an old xlog segment up to a future slot, it will
unlink the segment instead. Per discussion of bug #2712, in which it
became apparent that Windows can handle unlinking a file that's being
held open, but not renaming it.

dcbdf9b1

06 11月, 2006 1 次提交

Fix recently-understood problems with handling of XID freezing, particularly · 48188e16

由 Tom Lane 提交于 11月 05, 2006

in PITR scenarios. We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases. Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId. Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done. Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database. initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.

48188e16

19 10月, 2006 1 次提交

Add some code to CREATE DATABASE to check for pre-existing subdirectories · 1e758d52

由 Tom Lane 提交于 10月 18, 2006

that conflict with the OID that we want to use for the new database.
This avoids the risk of trying to remove files that maybe we shouldn't
remove. Per gripe from Jon Lapham and subsequent discussion of 27-Sep.

1e758d52

07 10月, 2006 1 次提交
- P
  
  Message style improvements · b9b4f10b
  由 Peter Eisentraut 提交于 10月 06, 2006
  
  b9b4f10b