1. 27 10月, 2007 1 次提交
  2. 25 10月, 2007 1 次提交
  3. 26 9月, 2007 2 次提交
    • T
      Dept. of second thoughts: fix loop in BgBufferSync so that the exit when · 7a315a09
      Tom Lane 提交于
      bgwriter_lru_maxpages is exceeded leaves the loop variables in the
      expected state.  In the original coding, we'd fail to advance
      next_to_clean, causing that buffer to be probably-uselessly rechecked next
      time, and also have an off-by-one idea of the number of buffers scanned.
      7a315a09
    • T
      Just-in-time background writing strategy. This code avoids re-scanning · 6f5c38dc
      Tom Lane 提交于
      buffers that cannot possibly need to be cleaned, and estimates how many
      buffers it should try to clean based on moving averages of recent allocation
      requests and density of reusable buffers.  The patch also adds a couple
      more columns to pg_stat_bgwriter to help measure the effectiveness of the
      bgwriter.
      
      Greg Smith, building on his own work and ideas from several other people,
      in particular a much older patch from Itagaki Takahiro.
      6f5c38dc
  4. 24 9月, 2007 1 次提交
  5. 22 9月, 2007 2 次提交
    • T
      Improve handling of prune/no-prune decisions by storing a page's oldest · cc59049d
      Tom Lane 提交于
      unpruned XMAX in its header.  At the cost of 4 bytes per page, this keeps us
      from performing heap_page_prune when there's no chance of pruning anything.
      Seems to be necessary per Heikki's preliminary performance testing.
      cc59049d
    • T
      Make some simple performance improvements in TransactionIdIsInProgress(). · da072ab2
      Tom Lane 提交于
      For XIDs of our own transaction and subtransactions, it's cheaper to ask
      TransactionIdIsCurrentTransactionId() than to look in shared memory.
      Also, the xids[] work array is always the same size within any given
      process, so malloc it just once instead of doing a palloc/pfree on every
      call; aside from being faster this lets us get rid of some goto's, since
      we no longer have any end-of-function pfree to do.  Both ideas by Heikki.
      da072ab2
  6. 21 9月, 2007 1 次提交
    • T
      HOT updates. When we update a tuple without changing any of its indexed · 282d2a03
      Tom Lane 提交于
      columns, and the new version can be stored on the same heap page, we no longer
      generate extra index entries for the new version.  Instead, index searches
      follow the HOT-chain links to ensure they find the correct tuple version.
      
      In addition, this patch introduces the ability to "prune" dead tuples on a
      per-page basis, without having to do a complete VACUUM pass to recover space.
      VACUUM is still needed to clean up dead index entries, however.
      
      Pavan Deolasee, with help from a bunch of other people.
      282d2a03
  7. 13 9月, 2007 1 次提交
    • T
      Redefine the lp_flags field of item pointers as having four states, rather · 68893035
      Tom Lane 提交于
      than two independent bits (one of which was never used in heap pages anyway,
      or at least hadn't been in a very long time).  This gives us flexibility to
      add the HOT notions of redirected and dead item pointers without requiring
      anything so klugy as magic values of lp_off and lp_len.  The state values
      are chosen so that for the states currently in use (pre-HOT) there is no
      change in the physical representation.
      68893035
  8. 09 9月, 2007 1 次提交
    • T
      Replace the former method of determining snapshot xmax --- to wit, calling · 6bd4f401
      Tom Lane 提交于
      ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid"
      variable that is updated during transaction commit or abort.  Since
      latestCompletedXid is written only in places that had to lock ProcArrayLock
      exclusively anyway, and is read only in places that had to lock ProcArrayLock
      shared anyway, it adds no new locking requirements to the system despite being
      cluster-wide.  Moreover, removing ReadNewTransactionId from snapshot
      acquisition eliminates the need to take both XidGenLock and ProcArrayLock at
      the same time.  Since XidGenLock is sometimes held across I/O this can be a
      significant win.  Some preliminary benchmarking suggested that this patch has
      no effect on average throughput but can significantly improve the worst-case
      transaction times seen in pgbench.  Concept by Florian Pflug, implementation
      by Tom Lane.
      6bd4f401
  9. 08 9月, 2007 1 次提交
    • T
      Don't take ProcArrayLock while exiting a transaction that has no XID; there is · 0a51e707
      Tom Lane 提交于
      no need for serialization against snapshot-taking because the xact doesn't
      affect anyone else's snapshot anyway.  Per discussion.  Also, move various
      info about the interlocking of transactions and snapshots out of code comments
      and into a hopefully-more-cohesive discussion in access/transam/README.
      
      Also, remove a couple of now-obsolete comments about having to force some WAL
      to be written to persuade RecordTransactionCommit to do its thing.
      0a51e707
  10. 07 9月, 2007 1 次提交
  11. 06 9月, 2007 2 次提交
    • T
      Volatile-qualify the ProcArray PGPROC pointer in a bunch of routines · 0ecb4ea7
      Tom Lane 提交于
      that examine fields that could change under them.  This is just to make
      really sure that when we are fetching a value 'only once', that's what
      actually happens.  Possibly this is a bug that should be back-patched,
      but in the absence of solid evidence that it's needed, I won't bother.
      0ecb4ea7
    • T
      Implement lazy XID allocation: transactions that do not modify any database · 295e6398
      Tom Lane 提交于
      rows will normally never obtain an XID at all.  We already did things this way
      for subtransactions, but this patch extends the concept to top-level
      transactions.  In applications where there are lots of short read-only
      transactions, this should improve performance noticeably; not so much from
      removal of the actual XID-assignments, as from reduction of overhead that's
      driven by the rate of XID consumption.  We add a concept of a "virtual
      transaction ID" so that active transactions can be uniquely identified even
      if they don't have a regular XID.  This is a much lighter-weight concept:
      uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
      record is made about them.
      
      Florian Pflug, with some editorialization by Tom.
      295e6398
  12. 28 8月, 2007 1 次提交
    • T
      Improve behavior of log_lock_waits patch. Ensure that something gets logged · 24d4517b
      Tom Lane 提交于
      even if the "deadlock detected" ERROR message is suppressed by an exception
      catcher.  Be clearer about the event sequence when a soft deadlock is fixed:
      the fixing process might or might not still have to wait, so log that
      separately.  Fix race condition when someone releases us from the lock partway
      through printing all this junk --- we'd not get confused about our state, but
      the log message sequence could have been misleading, ie, a "still waiting"
      message with no subsequent "acquired" message.  Greg Stark and Tom Lane.
      24d4517b
  13. 26 7月, 2007 3 次提交
    • T
      Remove FileUnlink(), which wasn't being used anywhere and interacted poorly · e4f4a7f5
      Tom Lane 提交于
      with the recent patch to log temp file sizes at removal time.  Doesn't seem
      worth fixing since it's unused.
      In passing, make a few elog messages conform to the message style guide.
      e4f4a7f5
    • T
      Arrange to put TOAST tables belonging to temporary tables into special schemas · 82eed4db
      Tom Lane 提交于
      named pg_toast_temp_nnn, alongside the pg_temp_nnn schemas used for the temp
      tables themselves.  This allows low-level code such as the relcache to
      recognize that these tables are indeed temporary, which enables various
      optimizations such as not WAL-logging changes and using local rather than
      shared buffers for access.  Aside from obvious performance benefits, this
      provides a solution to bug #3483, in which other backends unexpectedly held
      open file references to temporary tables.  The scheme preserves the property
      that TOAST tables are not in any schema that's normally in the search path,
      so they don't conflict with user table names.
      
      initdb forced because of changes in system view definitions.
      82eed4db
    • T
      Suppress warning when compiling with -DPROFILE_PID_DIR: sys/stat.h is · fdb5b69e
      Tom Lane 提交于
      supposed to be included when using mkdir().
      fdb5b69e
  14. 21 7月, 2007 1 次提交
    • T
      Fix WAL replay of truncate operations to cope with the possibility that the · 04fbe29a
      Tom Lane 提交于
      truncated relation was deleted later in the WAL sequence.  Since replay
      normally auto-creates a relation upon its first reference by a WAL log entry,
      failure is seen only if the truncate entry happens to be the first reference
      after the checkpoint we're restarting from; which is a pretty unusual case but
      of course not impossible.  Fix by making truncate entries auto-create like
      the other ones do.  Per report and test case from Dharmendra Goyal.
      04fbe29a
  15. 17 7月, 2007 1 次提交
  16. 09 7月, 2007 1 次提交
    • T
      Remove the pgstat_drop_relation() call from smgr_internal_unlink(), because · b09cb0cf
      Tom Lane 提交于
      we don't know at that point which relation OID to tell pgstat to forget.
      The code was passing the relfilenode, which is incorrect, and could possibly
      cause some other relation's stats to be zeroed out.  While we could try to
      clean this up, it seems much simpler and more reliable to let the next
      invocation of pgstat_vacuum_tabstat() fix things; which indeed is how it
      worked before I introduced the buggy code into 8.1.3 and later :-(.
      Problem noticed by Itagaki Takahiro, fix is per subsequent discussion.
      b09cb0cf
  17. 03 7月, 2007 1 次提交
    • T
      Fix incorrect comment about the timing of AbsorbFsyncRequests() during · 83aaebba
      Tom Lane 提交于
      checkpoint.  The comment claimed that we could do this anytime after
      setting the checkpoint REDO point, but actually BufferSync is relying
      on the assumption that buffers dumped by other backends will be fsync'd
      too.  So we really could not do it any sooner than we are doing it.
      83aaebba
  18. 01 7月, 2007 2 次提交
  19. 30 6月, 2007 1 次提交
  20. 28 6月, 2007 1 次提交
    • T
      Implement "distributed" checkpoints in which the checkpoint I/O is spread · 867e2c91
      Tom Lane 提交于
      over a fairly long period of time, rather than being spat out in a burst.
      This happens only for background checkpoints carried out by the bgwriter;
      other cases, such as a shutdown checkpoint, are still done at full speed.
      
      Remove the "all buffers" scan in the bgwriter, and associated stats
      infrastructure, since this seems no longer very useful when the checkpoint
      itself is properly throttled.
      
      Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas,
      and some minor API editorialization by me.
      867e2c91
  21. 20 6月, 2007 2 次提交
    • T
      Only log 'process acquired lock' if we actually did get the lock. This · 9cce91db
      Tom Lane 提交于
      test seems inessential right now since the only control path for not
      getting the lock is via CHECK_FOR_INTERRUPTS which won't return control
      to ProcSleep, but it would be important if we ever allow the deadlock
      code to kill someone else's transaction instead of our own.
      9cce91db
    • T
      Code review for log_lock_waits patch. Don't try to issue log messages from · 6e072287
      Tom Lane 提交于
      within a signal handler (this might be safe given the relatively narrow code
      range in which the interrupt is enabled, but it seems awfully risky); do issue
      more informative log messages that tell what is being waited for and the exact
      length of the wait; minor other code cleanup.  Greg Stark and Tom Lane
      6e072287
  22. 18 6月, 2007 1 次提交
  23. 13 6月, 2007 1 次提交
  24. 09 6月, 2007 1 次提交
  25. 08 6月, 2007 2 次提交
    • T
      Redefine IsTransactionState() to only return true for TRANS_INPROGRESS state, · 6d6d14b6
      Tom Lane 提交于
      which is the only state in which it's safe to initiate database queries.
      It turns out that all but two of the callers thought that's what it meant;
      and the other two were using it as a proxy for "will GetTopTransactionId()
      return a nonzero XID"?  Since it was in fact an unreliable guide to that,
      make those two just invoke GetTopTransactionId() always, then deal with a
      zero result if they get one.
      6d6d14b6
    • T
      Rework temp_tablespaces patch so that temp tablespaces are assigned separately · 24ee8af5
      Tom Lane 提交于
      for each temp file, rather than once per sort or hashjoin; this allows
      spreading the data of a large sort or join across multiple tablespaces.
      (I remain dubious that this will make any difference in practice, but certain
      people insisted.)  Arrange to cache the results of parsing the GUC variable
      instead of recomputing from scratch on every demand, and push usage of the
      cache down to the bottommost fd.c level.
      24ee8af5
  26. 04 6月, 2007 1 次提交
    • T
      Create a GUC parameter temp_tablespaces that allows selection of the · acfce502
      Tom Lane 提交于
      tablespace(s) in which to store temp tables and temporary files.  This is a
      list to allow spreading the load across multiple tablespaces (a random list
      element is chosen each time a temp object is to be created).  Temp files are
      not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
      directories.
      
      Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
      acfce502
  27. 02 6月, 2007 2 次提交
    • T
      Fix aboriginal bug in BufFileDumpBuffer that would cause it to write the · 964ec46c
      Tom Lane 提交于
      wrong data when dumping a bufferload that crosses a component-file boundary.
      This probably has not been seen in the wild because (a) component files are
      normally 1GB apiece and (b) non-block-aligned buffer usage is relatively
      rare.  But it's fairly easy to reproduce a problem if one reduces RELSEG_SIZE
      in a test build.  Kudos to Kurt Harriman for spotting the bug.
      964ec46c
    • T
      Make CREATE/DROP/RENAME DATABASE wait a little bit to see if other backends · bd0a2609
      Tom Lane 提交于
      will exit before failing because of conflicting DB usage.  Per discussion,
      this seems a good idea to help mask the fact that backend exit takes nonzero
      time.  Remove a couple of thereby-obsoleted sleeps in contrib and PL
      regression test sequences.
      bd0a2609
  28. 31 5月, 2007 1 次提交
    • T
      Make large sequential scans and VACUUMs work in a limited-size "ring" of · d526575f
      Tom Lane 提交于
      buffers, rather than blowing out the whole shared-buffer arena.  Aside from
      avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
      to cause a WAL flush for every page it modified, because we had it hacked to
      use only a single buffer.  Those flushes will now occur only once per
      ring-ful.  The exact ring size, and the threshold for seqscans to switch into
      the ring usage pattern, remain under debate; but the infrastructure seems
      done.  The key bit of infrastructure is a new optional BufferAccessStrategy
      object that can be passed to ReadBuffer operations; this replaces the former
      StrategyHintVacuum API.
      
      This patch also changes the buffer usage-count methodology a bit: we now
      advance usage_count when first pinning a buffer, rather than when last
      unpinning it.  To preserve the behavior that a buffer's lifetime starts to
      decrease when it's released, the clock sweep code is modified to not decrement
      usage_count of pinned buffers.
      
      Work not done in this commit: teach GiST and GIN indexes to use the vacuum
      BufferAccessStrategy for vacuum-driven fetches.
      
      Original patch by Simon, reworked by Heikki and again by Tom.
      d526575f
  29. 27 5月, 2007 1 次提交
    • T
      Fix up pgstats counting of live and dead tuples to recognize that committed · 77947c51
      Tom Lane 提交于
      and aborted transactions have different effects; also teach it not to assume
      that prepared transactions are always committed.
      
      Along the way, simplify the pgstats API by tying counting directly to
      Relations; I cannot detect any redeeming social value in having stats
      pointers in HeapScanDesc and IndexScanDesc structures.  And fix a few
      corner cases in which counts might be missed because the relation's
      pgstat_info pointer hadn't been set.
      77947c51
  30. 03 5月, 2007 2 次提交
    • T
      Dept. of second thoughts: add comments cautioning against using · 63735ca8
      Tom Lane 提交于
      ReadOrZeroBuffer to fetch pages from beyond physical EOF.  This would
      usually work, but would cause problems for md.c if writes occurred
      beyond a segment boundary when the previous segment file hadn't been
      fully extended.
      63735ca8
    • T
      During WAL recovery, when reading a page that we intend to overwrite completely · 8c3cc86e
      Tom Lane 提交于
      from the WAL data, don't bother to physically read it; just have bufmgr.c
      return a zeroed-out buffer instead.  This speeds recovery significantly,
      and also avoids unnecessary failures when a page-to-be-overwritten has corrupt
      page headers on disk.  This replaces a former kluge that accomplished the
      latter by pretending zero_damaged_pages was always ON during WAL recovery;
      which was OK when the kluge was put in, but is unsafe when restoring a WAL
      log that was written with full_page_writes off.
      
      Heikki Linnakangas
      8c3cc86e