1. 10 2月, 2010 1 次提交
    • T
      Fix up rickety handling of relation-truncation interlocks. · cbe9d6be
      Tom Lane 提交于
      Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr
      relation entries, so that they will get reset to InvalidBlockNumber whenever
      an smgr-level flush happens.  Because we now send smgr invalidation messages
      immediately (not at end of transaction) when a relation truncation occurs,
      this ensures that other backends will reset their values before they next
      access the relation.  We no longer need the unreliable assumption that a
      VACUUM that's doing a truncation will hold its AccessExclusive lock until
      commit --- in fact, we can intentionally release that lock as soon as we've
      completed the truncation.  This patch therefore reverts (most of) Alvaro's
      patch of 2009-11-10, as well as my marginal hacking on it yesterday.  We can
      also get rid of assorted no-longer-needed relcache flushes, which are far more
      expensive than an smgr flush because they kill a lot more state.
      
      In passing this patch fixes smgr_redo's failure to perform visibility-map
      truncation, and cleans up some rather dubious assumptions in freespace.c and
      visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of
      date.
      cbe9d6be
  2. 09 2月, 2010 1 次提交
  3. 08 2月, 2010 1 次提交
    • T
      Remove old-style VACUUM FULL (which was known for a little while as · 0a469c87
      Tom Lane 提交于
      VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
      Per discussion, the use case for this method of vacuuming is no longer large
      enough to justify maintaining it; not to mention that we don't wish to invest
      the work that would be needed to make it play nicely with Hot Standby.
      
      Aside from the code directly related to old-style VACUUM FULL, this commit
      removes support for certain WAL record types that could only be generated
      within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
      nontransactional generation of cache invalidation sinval messages (the last
      being the sticking point for Hot Standby).
      
      We still have to retain all code that copes with finding HEAP_MOVED_OFF and
      HEAP_MOVED_IN flag bits on existing tuples.  This can't be removed as long
      as we want to support in-place update from pre-9.0 databases.
      0a469c87
  4. 28 1月, 2010 1 次提交
  5. 03 1月, 2010 1 次提交
  6. 31 12月, 2009 1 次提交
    • T
      Revise pgstat's tracking of tuple changes to improve the reliability of · 48c192c1
      Tom Lane 提交于
      decisions about when to auto-analyze.
      
      The previous code depended on n_live_tuples + n_dead_tuples - last_anl_tuples,
      where all three of these numbers could be bad estimates from ANALYZE itself.
      Even worse, in the presence of a steady flow of HOT updates and matching
      HOT-tuple reclamations, auto-analyze might never trigger at all, even if all
      three numbers are exactly right, because n_dead_tuples could hold steady.
      
      To fix, replace last_anl_tuples with an accurately tracked count of the total
      number of committed tuple inserts + updates + deletes since the last ANALYZE
      on the table.  This can still be compared to the same threshold as before, but
      it's much more trustworthy than the old computation.  Tracking this requires
      one more intra-transaction counter per modified table within backends, but no
      additional memory space in the stats collector.  There probably isn't any
      measurable speed difference; if anything it might be a bit faster than before,
      since I was able to eliminate some per-tuple arithmetic operations in favor of
      adding sums once per (sub)transaction.
      
      Also, simplify the logic around pgstat vacuum and analyze reporting messages
      by not trying to fold VACUUM ANALYZE into a single pgstat message.
      
      The original thought behind this patch was to allow scheduling of analyzes
      on parent tables by artificially inflating their changes_since_analyze count.
      I've left that for a separate patch since this change seems to stand on its
      own merit.
      48c192c1
  7. 19 12月, 2009 1 次提交
    • S
      Allow read only connections during recovery, known as Hot Standby. · efc16ea5
      Simon Riggs 提交于
      Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.
      
      New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.
      
      This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.
      
      Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.
      
      Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
      efc16ea5
  8. 17 11月, 2009 1 次提交
  9. 11 11月, 2009 1 次提交
    • A
      Fix longstanding problems in VACUUM caused by untimely interruptions · e7ec0222
      Alvaro Herrera 提交于
      In VACUUM FULL, an interrupt after the initial transaction has been recorded
      as committed can cause postmaster to restart with the following error message:
      PANIC: cannot abort transaction NNNN, it was already committed
      This problem has been reported many times.
      
      In lazy VACUUM, an interrupt after the table has been truncated by
      lazy_truncate_heap causes other backends' relcache to still point to the
      removed pages; this can cause future INSERT and UPDATE queries to error out
      with the following error message:
      could not read block XX of relation 1663/NNN/MMMM: read only 0 of 8192 bytes
      The window to this race condition is extremely narrow, but it has been seen in
      the wild involving a cancelled autovacuum process.
      
      The solution for both problems is to inhibit interrupts in both operations
      until after the respective transactions have been committed.  It's not a
      complete solution, because the transaction could theoretically be aborted by
      some other error, but at least fixes the most common causes of both problems.
      e7ec0222
  10. 24 8月, 2009 1 次提交
    • T
      Fix a violation of WAL coding rules in the recent patch to include an · 7fc7a7c4
      Tom Lane 提交于
      "all tuples visible" flag in heap page headers.  The flag update *must*
      be applied before calling XLogInsert, but heap_update and the tuple
      moving routines in VACUUM FULL were ignoring this rule.  A crash and
      replay could therefore leave the flag incorrectly set, causing rows
      to appear visible in seqscans when they should not be.  This might explain
      recent reports of data corruption from Jeff Ross and others.
      
      In passing, do a bit of editorialization on comments in visibilitymap.c.
      7fc7a7c4
  11. 11 6月, 2009 1 次提交
  12. 07 6月, 2009 1 次提交
    • T
      Improve the IndexVacuumInfo/IndexBulkDeleteResult API to allow somewhat sane · 32ea2363
      Tom Lane 提交于
      behavior in cases where we don't know the heap tuple count accurately; in
      particular partial vacuum, but this also makes the API a bit more useful
      for ANALYZE.  This patch adds "estimated_count" flags to both structs so
      that an approximate count can be flagged as such, and adjusts the logic
      so that approximate counts are not used for updating pg_class.reltuples.
      
      This fixes my previous complaint that VACUUM was putting ridiculous values
      into pg_class.reltuples for indexes.  The actual impact of that bug is
      limited, because the planner only pays attention to reltuples for an index
      if the index is partial; which probably explains why beta testers hadn't
      noticed a degradation in plan quality from it.  But it needs to be fixed.
      
      The whole thing is a bit messy and should be redesigned in future, because
      reltuples now has the potential to drift quite far away from reality when
      a long period elapses with no non-partial vacuums.  But this is as good as
      it's going to get for 8.4.
      32ea2363
  13. 25 3月, 2009 1 次提交
    • T
      Implement "fastupdate" support for GIN indexes, in which we try to accumulate · ff301d6e
      Tom Lane 提交于
      multiple index entries in a holding area before adding them to the main index
      structure.  This helps because bulk insert is (usually) significantly faster
      than retail insert for GIN.
      
      This patch also removes GIN support for amgettuple-style index scans.  The
      API defined for amgettuple is difficult to support with fastupdate, and
      the previously committed partial-match feature didn't really work with
      it either.  We might eventually figure a way to put back amgettuple
      support, but it won't happen for 8.4.
      
      catversion bumped because of change in GIN's pg_am entry, and because
      the format of GIN indexes changed on-disk (there's a metapage now,
      and possibly a pending list).
      
      Teodor Sigaev
      ff301d6e
  14. 23 1月, 2009 1 次提交
  15. 16 1月, 2009 1 次提交
  16. 06 1月, 2009 1 次提交
  17. 02 1月, 2009 1 次提交
  18. 17 12月, 2008 1 次提交
    • H
      Don't reset pg_class.reltuples and relpages in VACUUM, if any pages were · dcf84099
      Heikki Linnakangas 提交于
      skipped. We could update relpages anyway, but it seems better to only
      update it together with reltuples, because we use the reltuples/relpages
      ratio in the planner. Also don't update n_live_tuples in pgstat.
      
      ANALYZE in VACUUM ANALYZE now needs to update pg_class, if the
      VACUUM-phase didn't do so. Added some boolean-passing to let analyze_rel
      know if it should update pg_class or not.
      
      I also moved the relcache invalidation (to update rd_targblock) from
      vac_update_relstats to where RelationTruncate is called, because
      vac_update_relstats is not called for partial vacuums anymore. It's more
      obvious to send the invalidation close to the truncation that requires it.
      
      Per report by Ned T. Crigler.
      dcf84099
  19. 04 12月, 2008 1 次提交
    • H
      Utilize the visibility map in autovacuum, too. There was an oversight in · 7537f52a
      Heikki Linnakangas 提交于
      the visibility map patch that because autovacuum always sets
      VacuumStmt->freeze_min_age, visibility map was never used for autovacuum,
      only for manually launched vacuums. This patch introduces a new scan_all
      field to VacuumStmt, indicating explicitly whether the visibility map
      should be used, or the whole relation should be scanned, to advance
      relfrozenxid. Anti-wraparound vacuums still need to scan all pages.
      7537f52a
  20. 03 12月, 2008 1 次提交
    • H
      Introduce visibility map. The visibility map is a bitmap with one bit per · 608195a3
      Heikki Linnakangas 提交于
      heap page, where a set bit indicates that all tuples on the page are
      visible to all transactions, and the page therefore doesn't need
      vacuuming. It is stored in a new relation fork.
      
      Lazy vacuum uses the visibility map to skip pages that don't need
      vacuuming. Vacuum is also responsible for setting the bits in the map.
      In the future, this can hopefully be used to implement index-only-scans,
      but we can't currently guarantee that the visibility map is always 100%
      up-to-date.
      
      In addition to the visibility map, there's a new PD_ALL_VISIBLE flag on
      each heap page, also indicating that all tuples on the page are visible to
      all transactions. It's important that this flag is kept up-to-date. It
      is also used to skip visibility tests in sequential scans, which gives a
      small performance gain on seqscans.
      608195a3
  21. 19 11月, 2008 1 次提交
    • H
      Rethink the way FSM truncation works. Instead of WAL-logging FSM · 33960006
      Heikki Linnakangas 提交于
      truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To
      make that cleaner from modularity point of view, move the WAL-logging one
      level up to RelationTruncate, and move RelationTruncate and all the
      related WAL-logging to new src/backend/catalog/storage.c file. Introduce
      new RelationCreateStorage and RelationDropStorage functions that are used
      instead of calling smgrcreate/smgrscheduleunlink directly. Move the
      pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new
      functions. This leaves smgr.c as a thin wrapper around md.c; all the
      transactional stuff is now in storage.c.
      
      This will make it easier to add new forks with similar truncation logic,
      like the visibility map.
      33960006
  22. 10 11月, 2008 1 次提交
  23. 31 10月, 2008 1 次提交
    • H
      Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer · 19c8dc83
      Heikki Linnakangas 提交于
      functions into one ReadBufferExtended function, that takes the strategy
      and mode as argument. There's three modes, RBM_NORMAL which is the default
      used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and
      a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages
      without throwing an error. The FSM needs the new mode to recover from
      corrupt pages, which could happend if we crash after extending an FSM file,
      and the new page is "torn".
      
      Add fork number to some error messages in bufmgr.c, that still lacked it.
      19c8dc83
  24. 30 9月, 2008 1 次提交
    • H
      Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the · 15c121b3
      Heikki Linnakangas 提交于
      free space information is stored in a dedicated FSM relation fork, with each
      relation (except for hash indexes; they don't use FSM).
      
      This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
      trace of them from the backend, initdb, and documentation.
      
      Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
      introduce a new variant of the get_raw_page(regclass, int4, int4) function in
      contrib/pageinspect that let's you to return pages from any relation fork, and
      a new fsm_page_contents() function to inspect the new FSM pages.
      15c121b3
  25. 12 5月, 2008 1 次提交
    • A
      Restructure some header files a bit, in particular heapam.h, by removing some · f8c4d7db
      Alvaro Herrera 提交于
      unnecessary #include lines in it.  Also, move some tuple routine prototypes and
      macros to htup.h, which allows removal of heapam.h inclusion from some .c
      files.
      
      For this to work, a new header file access/sysattr.h needed to be created,
      initially containing attribute numbers of system columns, for pg_dump usage.
      
      While at it, make contrib ltree, intarray and hstore header files more
      consistent with our header style.
      f8c4d7db
  26. 27 3月, 2008 1 次提交
  27. 25 3月, 2008 1 次提交
  28. 10 3月, 2008 1 次提交
  29. 02 1月, 2008 1 次提交
  30. 16 11月, 2007 1 次提交
  31. 27 9月, 2007 1 次提交
  32. 24 9月, 2007 2 次提交
    • A
      Reduce the size of memory allocations by lazy vacuum when processing a small · 58536626
      Alvaro Herrera 提交于
      table, by allocating just enough for a hardcoded number of dead tuples per
      page.  The current estimate is 200 dead tuples per page.
      
      Per reports from Jeff Amiel, Erik Jones and Marko Kreen, and subsequent
      discussion.
      CVS: ----------------------------------------------------------------------
      CVS: Enter Log.  Lines beginning with `CVS:' are removed automatically
      CVS:
      CVS: Committing in .
      CVS:
      CVS: Modified Files:
      CVS: 	commands/vacuumlazy.c
      CVS: ----------------------------------------------------------------------
      58536626
    • T
      Simplify and rename some GUC variables, per various recent discussions: · 48f7e643
      Tom Lane 提交于
      * stats_start_collector goes away; we always start the collector process,
      unless prevented by a problem with setting up the stats UDP socket.
      
      * stats_reset_on_server_start goes away; it seems useless in view of the
      availability of pg_stat_reset().
      
      * stats_block_level and stats_row_level are merged into a single variable
      "track_counts", which controls all reports sent to the collector process.
      
      * stats_command_string is renamed to track_activities.
      
      * log_autovacuum is renamed to log_autovacuum_min_duration to better reflect
      its meaning.
      
      The log_autovacuum change is not a compatibility issue since it didn't exist
      before 8.3 anyway.  The other changes need to be release-noted.
      48f7e643
  33. 21 9月, 2007 2 次提交
    • T
      Revert ill-fated patch to release exclusive lock early after vacuum · eb5f4d6c
      Tom Lane 提交于
      truncates a table.  Introduces race condition, as shown by buildfarm
      failures.
      eb5f4d6c
    • T
      HOT updates. When we update a tuple without changing any of its indexed · 282d2a03
      Tom Lane 提交于
      columns, and the new version can be stored on the same heap page, we no longer
      generate extra index entries for the new version.  Instead, index searches
      follow the HOT-chain links to ensure they find the correct tuple version.
      
      In addition, this patch introduces the ability to "prune" dead tuples on a
      per-page basis, without having to do a complete VACUUM pass to recover space.
      VACUUM is still needed to clean up dead index entries, however.
      
      Pavan Deolasee, with help from a bunch of other people.
      282d2a03
  34. 16 9月, 2007 1 次提交
    • T
      Fix aboriginal mistake in lazy VACUUM's code for truncating away · 43b0c918
      Tom Lane 提交于
      no-longer-needed pages at the end of a table.  We thought we could throw away
      pages containing HEAPTUPLE_DEAD tuples; but this is not so, because such
      tuples very likely have index entries pointing at them, and we wouldn't have
      removed the index entries.  The problem only emerges in a somewhat unlikely
      race condition: the dead tuples have to have been inserted by a transaction
      that later aborted, and this has to have happened between VACUUM's initial
      scan of the page and then rechecking it for empty in count_nondeletable_pages.
      But that timespan will include an index-cleaning pass, so it's not all that
      hard to hit.  This seems to explain a couple of previously unsolved bug
      reports.
      43b0c918
  35. 13 9月, 2007 1 次提交
    • T
      Redefine the lp_flags field of item pointers as having four states, rather · 68893035
      Tom Lane 提交于
      than two independent bits (one of which was never used in heap pages anyway,
      or at least hadn't been in a very long time).  This gives us flexibility to
      add the HOT notions of redirected and dead item pointers without requiring
      anything so klugy as magic values of lp_off and lp_len.  The state values
      are chosen so that for the states currently in use (pre-HOT) there is no
      change in the physical representation.
      68893035
  36. 12 9月, 2007 1 次提交
  37. 11 9月, 2007 2 次提交