提交 · 89fefd9416bfe6bec78fbc14bef06754cb4cc866 · Greenplum / Gpdb

14 8月, 2007 1 次提交

Fix two bugs induced in VACUUM FULL by async-commit patch. · 647fd9a1

由 Tom Lane 提交于 8月 13, 2007

First, we cannot assume that XLogAsyncCommitFlush guarantees hint bits will be
settable, because clog.c's inexact LSN bookkeeping results in windows where a
previously flushed transaction is considered unhintable because it shares an
LSN slot with a later unflushed transaction. But repair_frag requires
XMIN_COMMITTED to be correct so that it can distinguish tuples moved by the
current vacuum. Since not being able to set the bit is an uncommon corner
case, the most practical way of dealing with it seems to be to abandon
shrinking (ie, don't invoke repair_frag) when we find a non-dead tuple whose
XMIN_COMMITTED bit couldn't be set.

Second, it is possible for the same reason that a RECENTLY_DEAD tuple does not
get its XMAX_COMMITTED bit set during scan_heap. But by the time repair_frag
examines the tuple it might be possible to set the bit. We therefore must
take buffer content lock when calling HeapTupleSatisfiesVacuum a second time,
else we can get an Assert failure in SetBufferCommitInfoNeedsSave. This
latter bug is latent in existing releases, but I think it cannot actually
occur without async commit, since the first HeapTupleSatisfiesVacuum call
should always have set the bit. So I'm not going to back-patch it.

In passing, reduce the existing "cannot shrink relation" messages from NOTICE
to LOG level. The new message must be no higher than LOG if we don't want
unpredictable regression test failures, and consistency seems like a good
idea. Also arrange that only one such message is reported per VACUUM FULL;
in typical scenarios you could get spammed with many such messages, which
seems a bit useless.

647fd9a1

04 8月, 2007 1 次提交

Switch over to using the src/timezone functions for formatting timestamps · bdd6b622

由 Tom Lane 提交于 8月 04, 2007

displayed in the postmaster log. This avoids Windows-specific problems with
localized time zone names that are in the wrong encoding, and generally seems
like a good idea to forestall other potential platform-dependent issues.
To preserve the existing behavior that all backends will log in the same time
zone, create a new GUC variable log_timezone that can only be changed on a
system-wide basis, and reference log-related calculations to that zone instead
of the TimeZone variable.

This fixes the issue reported by Hiroshi Saito that timestamps printed by
xlog.c startup could be improperly localized on Windows. We still need a
simpler patch for that problem in the back branches, however.

bdd6b622

02 8月, 2007 1 次提交

Support an optional asynchronous commit mode, in which we don't flush WAL · 4a78cdeb

由 Tom Lane 提交于 8月 01, 2007

before reporting a transaction committed. Data consistency is still
guaranteed (unlike setting fsync = off), but a crash may lose the effects
of the last few transactions. Patch by Simon, some editorialization by Tom.

4a78cdeb

24 7月, 2007 1 次提交

Create a new dedicated Postgres process, "wal writer", which exists to write · ad429572

由 Tom Lane 提交于 7月 24, 2007

and fsync WAL at convenient intervals. For the moment it just tries to
offload this work from backends, but soon it will be responsible for
guaranteeing a maximum delay before asynchronously-committed transactions
will be flushed to disk.

This is a portion of Simon Riggs' async-commit patch, committed to CVS
separately because a background WAL writer seems like it might be a good idea
independently of the async-commit feature. I rebased walwriter.c on
bgwriter.c because it seemed like a more appropriate way of handling signals;
while the startup/shutdown logic in postmaster.c is more like autovac because
we want walwriter to quit before we start the shutdown checkpoint.

ad429572

01 7月, 2007 1 次提交
- T
  Improve logging of checkpoints. Patch by Greg Smith, worked over · 9fc25c05
  由 Tom Lane 提交于 6月 30, 2007
```
by Heikki and a little bit by me.
```
  9fc25c05
28 6月, 2007 1 次提交

Implement "distributed" checkpoints in which the checkpoint I/O is spread · 867e2c91

由 Tom Lane 提交于 6月 28, 2007

over a fairly long period of time, rather than being spat out in a burst.
This happens only for background checkpoints carried out by the bgwriter;
other cases, such as a shutdown checkpoint, are still done at full speed.

Remove the "all buffers" scan in the bgwriter, and associated stats
infrastructure, since this seems no longer very useful when the checkpoint
itself is properly throttled.

Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas,
and some minor API editorialization by me.

867e2c91

31 5月, 2007 3 次提交

P

Make some messages more consistent · 7ce9b368
由 Peter Eisentraut 提交于 5月 31, 2007

7ce9b368
P

Downgrade some low-level startup messages to DEBUG1. · 71fb7b90
由 Peter Eisentraut 提交于 5月 31, 2007

71fb7b90

Make large sequential scans and VACUUMs work in a limited-size "ring" of · d526575f

由 Tom Lane 提交于 5月 30, 2007

buffers, rather than blowing out the whole shared-buffer arena. Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer. Those flushes will now occur only once per
ring-ful. The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done. The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.

This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it. To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.

Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.

Original patch by Simon, reworked by Heikki and again by Tom.

d526575f

21 5月, 2007 1 次提交

To support external compression of archived WAL data, add a flag bit to · a8d539f1

由 Tom Lane 提交于 5月 20, 2007

WAL records that shows whether it is safe to remove full-page images
(ie, whether or not an on-line backup was in progress when the WAL entry
was made).  Also make provision for an XLOG_NOOP record type that can be
used to fill in the extra space when decompressing the data for restore.

This is the portion of Koichi Suzuki's "full page writes" patch that
has to go into the core database.  The remainder of that work is two
external compression and decompression programs, which for the time being
will undergo separate development on pgfoundry.  Per discussion.

Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be
possible to compress them (the previous coding caused essential info
to be omitted).  The other commonly-used record types seem OK already,
with the possible exception of GIN and GIST WAL records, which I don't
understand well enough to opine on.

a8d539f1

01 5月, 2007 1 次提交

Change the timestamps recorded in transaction commit/abort xlog records · c4320619

由 Tom Lane 提交于 4月 30, 2007

from time_t to TimestampTz representation. This provides full gettimeofday()
resolution of the timestamps, which might be useful when attempting to
do point-in-time recovery --- previously it was not possible to specify
the stop point with sub-second resolution. But mostly this is to get
rid of TimestampTz-to-time_t conversion overhead during commit. Per my
proposal of a day or two back.

c4320619

04 4月, 2007 1 次提交

Remove the CheckpointStartLock in favor of having backends show whether they · 9c9b6194

由 Tom Lane 提交于 4月 03, 2007

are in their commit critical sections via flags in the ProcArray. Checkpoint
can watch the ProcArray to determine when it's safe to proceed. This is
a considerably better solution to the original problem of race conditions
between checkpoint and transaction commit: it speeds up commit, since there's
one less lock to fool with, and it prevents the problem of checkpoint being
delayed indefinitely when there's a constant flow of commits. Heikki, with
some kibitzing from Tom.

9c9b6194

03 4月, 2007 1 次提交

Decouple the values of TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE. · b3005276

由 Tom Lane 提交于 4月 03, 2007

Add the latter to the values checked in pg_control, since it can't be changed
without invalidating toast table content. This commit in itself shouldn't
change any behavior, but it lays some necessary groundwork for experimentation
with these toast-control numbers.

Note: while TOAST_TUPLE_THRESHOLD can now be changed without initdb, some
thought still needs to be given to needs_toast_table() in toasting.c before
unleashing random changes.

b3005276

04 3月, 2007 1 次提交
- B
  Remove undo information from pg_controldata --- never used. · ae35867a
  由 Bruce Momjian 提交于 3月 03, 2007
```
Florian G. Pflug
```
  ae35867a
14 2月, 2007 1 次提交
- B
  Move fsync method macro defines into /include/access/xlogdefs.h so they · a9eb5396
  由 Bruce Momjian 提交于 2月 14, 2007
```
can be used by src/tools/fsync/test_fsync.c.
```
  a9eb5396
08 2月, 2007 2 次提交

Normalize fgets() calls to use sizeof() for calculating the buffer size · 086c1894

由 Peter Eisentraut 提交于 2月 08, 2007

where possible, and fix some sites that apparently thought that fgets()
will overwrite the buffer by one byte.

Also add some strlcpy() to eliminate some weird memory handling.

086c1894

Remove the xlog-centric "database system is ready" message and replace it with · 78d12161

由 Tom Lane 提交于 2月 07, 2007

"database system is ready to accept connections", which is issued by the
postmaster when it really is ready to accept connections. Per proposal from
Markus Schiltknecht and subsequent discussion.

78d12161

02 2月, 2007 1 次提交

Wording cleanup for error messages. Also change can't -> cannot. · 8b4ff8b6

由 Bruce Momjian 提交于 2月 01, 2007

Standard English uses "may", "can", and "might" in different ways:

        may - permission, "You may borrow my rake."

        can - ability, "I can lift that log."

        might - possibility, "It might rain today."

Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice.  Similarly, "It may crash" is better stated, "It might crash".

8b4ff8b6

06 1月, 2007 1 次提交
- B
  Update CVS HEAD for 2007 copyright. Back branches are typically not · 29dccf5f
  由 Bruce Momjian 提交于 1月 05, 2007
```
back-stamped for this.
```
  29dccf5f
09 12月, 2006 1 次提交

Remove the logId/logSeg fields from pg_control, because they are not needed · 0cb91ccb

由 Tom Lane 提交于 12月 08, 2006

in normal operation, and we can avoid rewriting pg_control at every log
segment switch if we don't insist that these values be valid. Reducing
the number of pg_control updates is a good idea for both performance and
reliability. It does make pg_resetxlog's life a bit harder, but that seems
a good tradeoff; and anyway the change to pg_resetxlog amounts to automating
something people formerly needed to do by hand, namely look at the existing
pg_xlog files to make sure the new WAL start point was past them.

In passing, change the wording of xlog.c's "database system was interrupted"
messages: describe the pg_control timestamp as "last known up at" rather than
implying it is the exact time of service interruption. With this change the
timestamp will generally be the time of the last checkpoint, which could be
many minutes before the failure; and we've already seen indications that
people tend to misinterpret the old wording.

initdb forced due to change in pg_control layout. Simon Riggs and Tom Lane

0cb91ccb

01 12月, 2006 1 次提交

Minor adjustments to make failures in startup/shutdown behave more cleanly. · 5f60086e

由 Tom Lane 提交于 11月 30, 2006

StartupXLOG and ShutdownXLOG no longer need to be critical sections, because
in all contexts where they are invoked, elog(ERROR) would be translated to
elog(FATAL) anyway. (One change in bgwriter.c is needed to make this true:
set ExitOnAnyError before trying to exit. This is a good fix anyway since
the existing code would have gone into an infinite loop on elog(ERROR) during
shutdown.) That avoids a misleading report of PANIC during semi-orderly
failures. Modify the postmaster to include the startup process in the set of
processes that get SIGTERM when a fast shutdown is requested, and also fix it
to not try to restart the bgwriter if the bgwriter fails while trying to write
the shutdown checkpoint. Net result is that "pg_ctl stop -m fast" does
something reasonable for a system in warm standby mode, and so should Unix
system shutdown (ie, universal SIGTERM). Per gripe from Stephen Harris and
some corner-case testing of my own.

5f60086e

22 11月, 2006 1 次提交

On systems that have setsid(2) (which should be just about everything except · 3ad0728c

由 Tom Lane 提交于 11月 21, 2006

Windows), arrange for each postmaster child process to be its own process
group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
process group not only the direct child process. This provides saner behavior
for archive and recovery scripts; in particular, it's possible to shut down a
warm-standby recovery server using "pg_ctl stop -m immediate", since delivery
of SIGQUIT to the startup subprocess will result in killing the waiting
recovery_command. Also, this makes Query Cancel and statement_timeout apply
to scripts being run from backends via system(). (There is no support in the
core backend for that, but it's widely done using untrusted PLs.) Per gripe
from Stephen Harris and subsequent discussion.

3ad0728c

16 11月, 2006 1 次提交
- P
  
  String fix · e138b809
  由 Peter Eisentraut 提交于 11月 16, 2006
  
  e138b809
11 11月, 2006 1 次提交
- T
  
  Clean up some misleading references to %p being a full path, per Simon. · 792d6edd
  由 Tom Lane 提交于 11月 10, 2006
  
  792d6edd
09 11月, 2006 1 次提交

Change Windows rename and unlink substitutes so that they time out after · dcbdf9b1

由 Tom Lane 提交于 11月 08, 2006

30 seconds instead of retrying forever. Also modify xlog.c so that if
it fails to rename an old xlog segment up to a future slot, it will
unlink the segment instead. Per discussion of bug #2712, in which it
became apparent that Windows can handle unlinking a file that's being
held open, but not renaming it.

dcbdf9b1

06 11月, 2006 1 次提交

Fix recently-understood problems with handling of XID freezing, particularly · 48188e16

由 Tom Lane 提交于 11月 05, 2006

in PITR scenarios. We now WAL-log the replacement of old XIDs with
FrozenTransactionId, so that such replacement is guaranteed to propagate to
PITR slave databases. Also, rather than relying on hint-bit updates to be
preserved, pg_clog is not truncated until all instances of an XID are known to
have been replaced by FrozenTransactionId. Add new GUC variables and
pg_autovacuum columns to allow management of the freezing policy, so that
users can trade off the size of pg_clog against the amount of freezing work
done. Revise the already-existing code that forces autovacuum of tables
approaching the wraparound point to make it more bulletproof; also, revise the
autovacuum logic so that anti-wraparound vacuuming is done per-table rather
than per-database. initdb forced because of changes in pg_class, pg_database,
and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.

48188e16

19 10月, 2006 1 次提交

Add some code to CREATE DATABASE to check for pre-existing subdirectories · 1e758d52

由 Tom Lane 提交于 10月 18, 2006

that conflict with the OID that we want to use for the new database.
This avoids the risk of trying to remove files that maybe we shouldn't
remove. Per gripe from Jon Lapham and subsequent discussion of 27-Sep.

1e758d52

07 10月, 2006 1 次提交
- P
  
  Message style improvements · b9b4f10b
  由 Peter Eisentraut 提交于 10月 06, 2006
  
  b9b4f10b
04 10月, 2006 1 次提交
- B
  
  pgindent run for 8.2. · f99a569a
  由 Bruce Momjian 提交于 10月 04, 2006
  
  f99a569a
22 8月, 2006 1 次提交

Make the server track an 'XID epoch', that is, maintain higher-order bits · 35af5422

由 Tom Lane 提交于 8月 21, 2006

of the transaction ID counter. Nothing is done with the epoch except to
store it in checkpoint records, but this provides a foundation with which
add-on code can pretend that XIDs never wrap around. This is a severely
trimmed and rewritten version of the xxid patch submitted by Marko Kreen.
Per discussion, the epoch counter seems the only part of xxid that really
needs to be in the core server.

35af5422

18 8月, 2006 1 次提交

Implement archive_timeout feature to force xlog file switches to occur no more · e8ea9e95

由 Tom Lane 提交于 8月 17, 2006

than N seconds apart. This allows a simple, if not very high performance,
means of guaranteeing that a PITR archive is no more than N seconds behind
real time. Also make pg_current_xlog_location return the WAL Write pointer,
add pg_current_xlog_insert_location to return the Insert pointer, and fix
pg_xlogfile_name_offset to return its results as a two-element record instead
of a smashed-together string, as per recent discussion.

Simon Riggs

e8ea9e95

08 8月, 2006 1 次提交

Make recovery from WAL be restartable, by executing a checkpoint-like · e0028369

由 Tom Lane 提交于 8月 07, 2006

operation every so often. This improves the usefulness of PITR log
shipping for hot standby: formerly, if the standby server crashed, it
was necessary to restart it from the last base backup and replay all
the WAL since then. Now it will only need to reread about the same
amount of WAL as the master server would. The behavior might also
come in handy during a long PITR replay sequence. Simon Riggs,
with some editorialization by Tom Lane.

e0028369

06 8月, 2006 1 次提交

Add support for forcing a switch to a new xlog file; cause such a switch · 704ddaaa

由 Tom Lane 提交于 8月 06, 2006

to happen automatically during pg_stop_backup(). Add some functions for
interrogating the current xlog insertion point and for easily extracting
WAL filenames from the hex WAL locations displayed by pg_stop_backup
and friends. Simon Riggs with some editorialization by Tom Lane.

704ddaaa

30 7月, 2006 1 次提交

Modify snapshot definition so that lazy vacuums are ignored by other · 92c2ecc1

由 Alvaro Herrera 提交于 7月 30, 2006

vacuums.  This allows a OLTP-like system with big tables to continue
regular vacuuming on small-but-frequently-updated tables while the
big tables are being vacuumed.

Original patch from Hannu Krossing, rewritten by Tom Lane and updated
by me.

92c2ecc1

14 7月, 2006 2 次提交
- B
  
  Remove 576 references of include files that were not needed. · e0522505
  由 Bruce Momjian 提交于 7月 14, 2006
  
  e0522505
- B
  Allow include files to compile own their own. · a22d76d9
  由 Bruce Momjian 提交于 7月 13, 2006
```
Strip unused include files out unused include files, and add needed
includes to C files.

The next step is to remove unused include files in C files.
```
  a22d76d9
28 6月, 2006 1 次提交

Put #ifdef NOT_USED around posix_fadvise call. We may want to resurrect · 3c71244b

由 Tom Lane 提交于 6月 27, 2006

this someday, but right now it seems that posix_fadvise is immature to
the point of being broken on many platforms ... and we don't have any
benchmark evidence proving it's worth spending time on.

3c71244b

23 6月, 2006 1 次提交

pg_stop_backup was calling XLogArchiveNotify() twice for the newly created · 3a04f53e

由 Tom Lane 提交于 6月 22, 2006

backup history file.  Bug introduced by the 8.1 change to make pg_stop_backup
delete older history files.  Per report from Masao Fujii.

3a04f53e

19 6月, 2006 1 次提交

Don't try to call posix_fadvise() unless <fcntl.h> supplies a declaration · 1e8ae136

由 Tom Lane 提交于 6月 18, 2006

for it.  Hopefully will fix core dump evidenced by some buildfarm members
since fadvise patch went in.  The actual definition of the function is not
ABI-compatible with compiler's default assumption in the absence of any
declaration, so it's clearly unsafe to try to call it without seeing a
declaration.

1e8ae136

16 6月, 2006 1 次提交
- B
  
  Test for POSIX_FADV_DONTNEED to use posix_fadvise(). · 40bc06fa
  由 Bruce Momjian 提交于 6月 16, 2006
  
  40bc06fa