1. 14 8月, 2007 1 次提交
    • T
      Fix two bugs induced in VACUUM FULL by async-commit patch. · 647fd9a1
      Tom Lane 提交于
      First, we cannot assume that XLogAsyncCommitFlush guarantees hint bits will be
      settable, because clog.c's inexact LSN bookkeeping results in windows where a
      previously flushed transaction is considered unhintable because it shares an
      LSN slot with a later unflushed transaction.  But repair_frag requires
      XMIN_COMMITTED to be correct so that it can distinguish tuples moved by the
      current vacuum.  Since not being able to set the bit is an uncommon corner
      case, the most practical way of dealing with it seems to be to abandon
      shrinking (ie, don't invoke repair_frag) when we find a non-dead tuple whose
      XMIN_COMMITTED bit couldn't be set.
      
      Second, it is possible for the same reason that a RECENTLY_DEAD tuple does not
      get its XMAX_COMMITTED bit set during scan_heap.  But by the time repair_frag
      examines the tuple it might be possible to set the bit.  We therefore must
      take buffer content lock when calling HeapTupleSatisfiesVacuum a second time,
      else we can get an Assert failure in SetBufferCommitInfoNeedsSave.  This
      latter bug is latent in existing releases, but I think it cannot actually
      occur without async commit, since the first HeapTupleSatisfiesVacuum call
      should always have set the bit.  So I'm not going to back-patch it.
      
      In passing, reduce the existing "cannot shrink relation" messages from NOTICE
      to LOG level.  The new message must be no higher than LOG if we don't want
      unpredictable regression test failures, and consistency seems like a good
      idea.  Also arrange that only one such message is reported per VACUUM FULL;
      in typical scenarios you could get spammed with many such messages, which
      seems a bit useless.
      647fd9a1
  2. 04 8月, 2007 1 次提交
    • T
      Switch over to using the src/timezone functions for formatting timestamps · bdd6b622
      Tom Lane 提交于
      displayed in the postmaster log.  This avoids Windows-specific problems with
      localized time zone names that are in the wrong encoding, and generally seems
      like a good idea to forestall other potential platform-dependent issues.
      To preserve the existing behavior that all backends will log in the same time
      zone, create a new GUC variable log_timezone that can only be changed on a
      system-wide basis, and reference log-related calculations to that zone instead
      of the TimeZone variable.
      
      This fixes the issue reported by Hiroshi Saito that timestamps printed by
      xlog.c startup could be improperly localized on Windows.  We still need a
      simpler patch for that problem in the back branches, however.
      bdd6b622
  3. 02 8月, 2007 1 次提交
  4. 24 7月, 2007 1 次提交
    • T
      Create a new dedicated Postgres process, "wal writer", which exists to write · ad429572
      Tom Lane 提交于
      and fsync WAL at convenient intervals.  For the moment it just tries to
      offload this work from backends, but soon it will be responsible for
      guaranteeing a maximum delay before asynchronously-committed transactions
      will be flushed to disk.
      
      This is a portion of Simon Riggs' async-commit patch, committed to CVS
      separately because a background WAL writer seems like it might be a good idea
      independently of the async-commit feature.  I rebased walwriter.c on
      bgwriter.c because it seemed like a more appropriate way of handling signals;
      while the startup/shutdown logic in postmaster.c is more like autovac because
      we want walwriter to quit before we start the shutdown checkpoint.
      ad429572
  5. 01 7月, 2007 1 次提交
  6. 28 6月, 2007 1 次提交
    • T
      Implement "distributed" checkpoints in which the checkpoint I/O is spread · 867e2c91
      Tom Lane 提交于
      over a fairly long period of time, rather than being spat out in a burst.
      This happens only for background checkpoints carried out by the bgwriter;
      other cases, such as a shutdown checkpoint, are still done at full speed.
      
      Remove the "all buffers" scan in the bgwriter, and associated stats
      infrastructure, since this seems no longer very useful when the checkpoint
      itself is properly throttled.
      
      Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas,
      and some minor API editorialization by me.
      867e2c91
  7. 31 5月, 2007 3 次提交
    • P
      Make some messages more consistent · 7ce9b368
      Peter Eisentraut 提交于
      7ce9b368
    • P
      71fb7b90
    • T
      Make large sequential scans and VACUUMs work in a limited-size "ring" of · d526575f
      Tom Lane 提交于
      buffers, rather than blowing out the whole shared-buffer arena.  Aside from
      avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
      to cause a WAL flush for every page it modified, because we had it hacked to
      use only a single buffer.  Those flushes will now occur only once per
      ring-ful.  The exact ring size, and the threshold for seqscans to switch into
      the ring usage pattern, remain under debate; but the infrastructure seems
      done.  The key bit of infrastructure is a new optional BufferAccessStrategy
      object that can be passed to ReadBuffer operations; this replaces the former
      StrategyHintVacuum API.
      
      This patch also changes the buffer usage-count methodology a bit: we now
      advance usage_count when first pinning a buffer, rather than when last
      unpinning it.  To preserve the behavior that a buffer's lifetime starts to
      decrease when it's released, the clock sweep code is modified to not decrement
      usage_count of pinned buffers.
      
      Work not done in this commit: teach GiST and GIN indexes to use the vacuum
      BufferAccessStrategy for vacuum-driven fetches.
      
      Original patch by Simon, reworked by Heikki and again by Tom.
      d526575f
  8. 21 5月, 2007 1 次提交
    • T
      To support external compression of archived WAL data, add a flag bit to · a8d539f1
      Tom Lane 提交于
      WAL records that shows whether it is safe to remove full-page images
      (ie, whether or not an on-line backup was in progress when the WAL entry
      was made).  Also make provision for an XLOG_NOOP record type that can be
      used to fill in the extra space when decompressing the data for restore.
      
      This is the portion of Koichi Suzuki's "full page writes" patch that
      has to go into the core database.  The remainder of that work is two
      external compression and decompression programs, which for the time being
      will undergo separate development on pgfoundry.  Per discussion.
      
      Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be
      possible to compress them (the previous coding caused essential info
      to be omitted).  The other commonly-used record types seem OK already,
      with the possible exception of GIN and GIST WAL records, which I don't
      understand well enough to opine on.
      a8d539f1
  9. 01 5月, 2007 1 次提交
    • T
      Change the timestamps recorded in transaction commit/abort xlog records · c4320619
      Tom Lane 提交于
      from time_t to TimestampTz representation.  This provides full gettimeofday()
      resolution of the timestamps, which might be useful when attempting to
      do point-in-time recovery --- previously it was not possible to specify
      the stop point with sub-second resolution.  But mostly this is to get
      rid of TimestampTz-to-time_t conversion overhead during commit.  Per my
      proposal of a day or two back.
      c4320619
  10. 04 4月, 2007 1 次提交
    • T
      Remove the CheckpointStartLock in favor of having backends show whether they · 9c9b6194
      Tom Lane 提交于
      are in their commit critical sections via flags in the ProcArray.  Checkpoint
      can watch the ProcArray to determine when it's safe to proceed.  This is
      a considerably better solution to the original problem of race conditions
      between checkpoint and transaction commit: it speeds up commit, since there's
      one less lock to fool with, and it prevents the problem of checkpoint being
      delayed indefinitely when there's a constant flow of commits.  Heikki, with
      some kibitzing from Tom.
      9c9b6194
  11. 03 4月, 2007 1 次提交
    • T
      Decouple the values of TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE. · b3005276
      Tom Lane 提交于
      Add the latter to the values checked in pg_control, since it can't be changed
      without invalidating toast table content.  This commit in itself shouldn't
      change any behavior, but it lays some necessary groundwork for experimentation
      with these toast-control numbers.
      
      Note: while TOAST_TUPLE_THRESHOLD can now be changed without initdb, some
      thought still needs to be given to needs_toast_table() in toasting.c before
      unleashing random changes.
      b3005276
  12. 04 3月, 2007 1 次提交
  13. 14 2月, 2007 1 次提交
  14. 08 2月, 2007 2 次提交
  15. 02 2月, 2007 1 次提交
    • B
      Wording cleanup for error messages. Also change can't -> cannot. · 8b4ff8b6
      Bruce Momjian 提交于
      Standard English uses "may", "can", and "might" in different ways:
      
              may - permission, "You may borrow my rake."
      
              can - ability, "I can lift that log."
      
              might - possibility, "It might rain today."
      
      Unfortunately, in conversational English, their use is often mixed, as
      in, "You may use this variable to do X", when in fact, "can" is a better
      choice.  Similarly, "It may crash" is better stated, "It might crash".
      8b4ff8b6
  16. 06 1月, 2007 1 次提交
  17. 09 12月, 2006 1 次提交
    • T
      Remove the logId/logSeg fields from pg_control, because they are not needed · 0cb91ccb
      Tom Lane 提交于
      in normal operation, and we can avoid rewriting pg_control at every log
      segment switch if we don't insist that these values be valid.  Reducing
      the number of pg_control updates is a good idea for both performance and
      reliability.  It does make pg_resetxlog's life a bit harder, but that seems
      a good tradeoff; and anyway the change to pg_resetxlog amounts to automating
      something people formerly needed to do by hand, namely look at the existing
      pg_xlog files to make sure the new WAL start point was past them.
      
      In passing, change the wording of xlog.c's "database system was interrupted"
      messages: describe the pg_control timestamp as "last known up at" rather than
      implying it is the exact time of service interruption.  With this change the
      timestamp will generally be the time of the last checkpoint, which could be
      many minutes before the failure; and we've already seen indications that
      people tend to misinterpret the old wording.
      
      initdb forced due to change in pg_control layout.  Simon Riggs and Tom Lane
      0cb91ccb
  18. 01 12月, 2006 1 次提交
    • T
      Minor adjustments to make failures in startup/shutdown behave more cleanly. · 5f60086e
      Tom Lane 提交于
      StartupXLOG and ShutdownXLOG no longer need to be critical sections, because
      in all contexts where they are invoked, elog(ERROR) would be translated to
      elog(FATAL) anyway.  (One change in bgwriter.c is needed to make this true:
      set ExitOnAnyError before trying to exit.  This is a good fix anyway since
      the existing code would have gone into an infinite loop on elog(ERROR) during
      shutdown.)  That avoids a misleading report of PANIC during semi-orderly
      failures.  Modify the postmaster to include the startup process in the set of
      processes that get SIGTERM when a fast shutdown is requested, and also fix it
      to not try to restart the bgwriter if the bgwriter fails while trying to write
      the shutdown checkpoint.  Net result is that "pg_ctl stop -m fast" does
      something reasonable for a system in warm standby mode, and so should Unix
      system shutdown (ie, universal SIGTERM).  Per gripe from Stephen Harris and
      some corner-case testing of my own.
      5f60086e
  19. 22 11月, 2006 1 次提交
    • T
      On systems that have setsid(2) (which should be just about everything except · 3ad0728c
      Tom Lane 提交于
      Windows), arrange for each postmaster child process to be its own process
      group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
      process group not only the direct child process.  This provides saner behavior
      for archive and recovery scripts; in particular, it's possible to shut down a
      warm-standby recovery server using "pg_ctl stop -m immediate", since delivery
      of SIGQUIT to the startup subprocess will result in killing the waiting
      recovery_command.  Also, this makes Query Cancel and statement_timeout apply
      to scripts being run from backends via system().  (There is no support in the
      core backend for that, but it's widely done using untrusted PLs.)  Per gripe
      from Stephen Harris and subsequent discussion.
      3ad0728c
  20. 16 11月, 2006 1 次提交
  21. 11 11月, 2006 1 次提交
  22. 09 11月, 2006 1 次提交
    • T
      Change Windows rename and unlink substitutes so that they time out after · dcbdf9b1
      Tom Lane 提交于
      30 seconds instead of retrying forever.  Also modify xlog.c so that if
      it fails to rename an old xlog segment up to a future slot, it will
      unlink the segment instead.  Per discussion of bug #2712, in which it
      became apparent that Windows can handle unlinking a file that's being
      held open, but not renaming it.
      dcbdf9b1
  23. 06 11月, 2006 1 次提交
    • T
      Fix recently-understood problems with handling of XID freezing, particularly · 48188e16
      Tom Lane 提交于
      in PITR scenarios.  We now WAL-log the replacement of old XIDs with
      FrozenTransactionId, so that such replacement is guaranteed to propagate to
      PITR slave databases.  Also, rather than relying on hint-bit updates to be
      preserved, pg_clog is not truncated until all instances of an XID are known to
      have been replaced by FrozenTransactionId.  Add new GUC variables and
      pg_autovacuum columns to allow management of the freezing policy, so that
      users can trade off the size of pg_clog against the amount of freezing work
      done.  Revise the already-existing code that forces autovacuum of tables
      approaching the wraparound point to make it more bulletproof; also, revise the
      autovacuum logic so that anti-wraparound vacuuming is done per-table rather
      than per-database.  initdb forced because of changes in pg_class, pg_database,
      and pg_autovacuum catalogs.  Heikki Linnakangas, Simon Riggs, and Tom Lane.
      48188e16
  24. 19 10月, 2006 1 次提交
  25. 07 10月, 2006 1 次提交
  26. 04 10月, 2006 1 次提交
  27. 22 8月, 2006 1 次提交
    • T
      Make the server track an 'XID epoch', that is, maintain higher-order bits · 35af5422
      Tom Lane 提交于
      of the transaction ID counter.  Nothing is done with the epoch except to
      store it in checkpoint records, but this provides a foundation with which
      add-on code can pretend that XIDs never wrap around.  This is a severely
      trimmed and rewritten version of the xxid patch submitted by Marko Kreen.
      Per discussion, the epoch counter seems the only part of xxid that really
      needs to be in the core server.
      35af5422
  28. 18 8月, 2006 1 次提交
    • T
      Implement archive_timeout feature to force xlog file switches to occur no more · e8ea9e95
      Tom Lane 提交于
      than N seconds apart.  This allows a simple, if not very high performance,
      means of guaranteeing that a PITR archive is no more than N seconds behind
      real time.  Also make pg_current_xlog_location return the WAL Write pointer,
      add pg_current_xlog_insert_location to return the Insert pointer, and fix
      pg_xlogfile_name_offset to return its results as a two-element record instead
      of a smashed-together string, as per recent discussion.
      
      Simon Riggs
      e8ea9e95
  29. 08 8月, 2006 1 次提交
    • T
      Make recovery from WAL be restartable, by executing a checkpoint-like · e0028369
      Tom Lane 提交于
      operation every so often.  This improves the usefulness of PITR log
      shipping for hot standby: formerly, if the standby server crashed, it
      was necessary to restart it from the last base backup and replay all
      the WAL since then.  Now it will only need to reread about the same
      amount of WAL as the master server would.  The behavior might also
      come in handy during a long PITR replay sequence.  Simon Riggs,
      with some editorialization by Tom Lane.
      e0028369
  30. 06 8月, 2006 1 次提交
    • T
      Add support for forcing a switch to a new xlog file; cause such a switch · 704ddaaa
      Tom Lane 提交于
      to happen automatically during pg_stop_backup().  Add some functions for
      interrogating the current xlog insertion point and for easily extracting
      WAL filenames from the hex WAL locations displayed by pg_stop_backup
      and friends.  Simon Riggs with some editorialization by Tom Lane.
      704ddaaa
  31. 30 7月, 2006 1 次提交
  32. 14 7月, 2006 2 次提交
  33. 28 6月, 2006 1 次提交
  34. 23 6月, 2006 1 次提交
  35. 19 6月, 2006 1 次提交
    • T
      Don't try to call posix_fadvise() unless <fcntl.h> supplies a declaration · 1e8ae136
      Tom Lane 提交于
      for it.  Hopefully will fix core dump evidenced by some buildfarm members
      since fadvise patch went in.  The actual definition of the function is not
      ABI-compatible with compiler's default assumption in the absence of any
      declaration, so it's clearly unsafe to try to call it without seeing a
      declaration.
      1e8ae136
  36. 16 6月, 2006 1 次提交