提交 · cbe9d6beb4ae1cb20c08cab29b534be4923b6768 · Greenplum / Gpdb

10 2月, 2010 1 次提交

Fix up rickety handling of relation-truncation interlocks. · cbe9d6be

由 Tom Lane 提交于 2月 09, 2010

Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr
relation entries, so that they will get reset to InvalidBlockNumber whenever
an smgr-level flush happens. Because we now send smgr invalidation messages
immediately (not at end of transaction) when a relation truncation occurs,
this ensures that other backends will reset their values before they next
access the relation. We no longer need the unreliable assumption that a
VACUUM that's doing a truncation will hold its AccessExclusive lock until
commit --- in fact, we can intentionally release that lock as soon as we've
completed the truncation. This patch therefore reverts (most of) Alvaro's
patch of 2009-11-10, as well as my marginal hacking on it yesterday. We can
also get rid of assorted no-longer-needed relcache flushes, which are far more
expensive than an smgr flush because they kill a lot more state.

In passing this patch fixes smgr_redo's failure to perform visibility-map
truncation, and cleans up some rather dubious assumptions in freespace.c and
visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of
date.

cbe9d6be

09 2月, 2010 1 次提交

Rearrange lazy-vacuum code a little bit to reduce the window between · 95289e4a

由 Tom Lane 提交于 2月 09, 2010

truncating the table and transaction commit.  This isn't really making
it safe, but at least there is no good reason to do free space map
cleanup within the risk window.  Don't lock out cancel interrupts
until we have to, either.

95289e4a

08 2月, 2010 1 次提交

Remove old-style VACUUM FULL (which was known for a little while as · 0a469c87

由 Tom Lane 提交于 2月 08, 2010

VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
Per discussion, the use case for this method of vacuuming is no longer large
enough to justify maintaining it; not to mention that we don't wish to invest
the work that would be needed to make it play nicely with Hot Standby.

Aside from the code directly related to old-style VACUUM FULL, this commit
removes support for certain WAL record types that could only be generated
within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
nontransactional generation of cache invalidation sinval messages (the last
being the sticking point for Hot Standby).

We still have to retain all code that copes with finding HEAP_MOVED_OFF and
HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long
as we want to support in-place update from pre-9.0 databases.

0a469c87

28 1月, 2010 1 次提交

Change a few remaining calls of XLogArchivingActive() to use · e0e8b963

由 Heikki Linnakangas 提交于 1月 28, 2010

XLogIsNeeded() instead, to determine if an otherwise non-logged operation
needs to be logged in WAL for standby servers.

Fujii Masao

e0e8b963

03 1月, 2010 1 次提交
- B
  
  Update copyright for the year 2010. · 02398008
  由 Bruce Momjian 提交于 1月 02, 2010
  
  02398008
31 12月, 2009 1 次提交

Revise pgstat's tracking of tuple changes to improve the reliability of · 48c192c1

由 Tom Lane 提交于 12月 30, 2009

decisions about when to auto-analyze.

The previous code depended on n_live_tuples + n_dead_tuples - last_anl_tuples,
where all three of these numbers could be bad estimates from ANALYZE itself.
Even worse, in the presence of a steady flow of HOT updates and matching
HOT-tuple reclamations, auto-analyze might never trigger at all, even if all
three numbers are exactly right, because n_dead_tuples could hold steady.

To fix, replace last_anl_tuples with an accurately tracked count of the total
number of committed tuple inserts + updates + deletes since the last ANALYZE
on the table. This can still be compared to the same threshold as before, but
it's much more trustworthy than the old computation. Tracking this requires
one more intra-transaction counter per modified table within backends, but no
additional memory space in the stats collector. There probably isn't any
measurable speed difference; if anything it might be a bit faster than before,
since I was able to eliminate some per-tuple arithmetic operations in favor of
adding sums once per (sub)transaction.

Also, simplify the logic around pgstat vacuum and analyze reporting messages
by not trying to fold VACUUM ANALYZE into a single pgstat message.

The original thought behind this patch was to allow scheduling of analyzes
on parent tables by artificially inflating their changes_since_analyze count.
I've left that for a separate patch since this change seems to stand on its
own merit.

48c192c1

19 12月, 2009 1 次提交

Allow read only connections during recovery, known as Hot Standby. · efc16ea5

由 Simon Riggs 提交于 12月 19, 2009

Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record.

New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far.

This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required.

Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit.

Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.

efc16ea5

17 11月, 2009 1 次提交

Provide a parenthesized-options syntax for VACUUM, analogous to that recently · 5e66a51c

由 Tom Lane 提交于 11月 16, 2009

adopted for EXPLAIN.  This will allow additional options to be implemented
in future without having to make them fully-reserved keywords.  The old syntax
remains available for existing options, however.

Itagaki Takahiro

5e66a51c

11 11月, 2009 1 次提交

Fix longstanding problems in VACUUM caused by untimely interruptions · e7ec0222

由 Alvaro Herrera 提交于 11月 10, 2009

In VACUUM FULL, an interrupt after the initial transaction has been recorded
as committed can cause postmaster to restart with the following error message:
PANIC: cannot abort transaction NNNN, it was already committed
This problem has been reported many times.

In lazy VACUUM, an interrupt after the table has been truncated by
lazy_truncate_heap causes other backends' relcache to still point to the
removed pages; this can cause future INSERT and UPDATE queries to error out
with the following error message:
could not read block XX of relation 1663/NNN/MMMM: read only 0 of 8192 bytes
The window to this race condition is extremely narrow, but it has been seen in
the wild involving a cancelled autovacuum process.

The solution for both problems is to inhibit interrupts in both operations
until after the respective transactions have been committed. It's not a
complete solution, because the transaction could theoretically be aborted by
some other error, but at least fixes the most common causes of both problems.

e7ec0222

24 8月, 2009 1 次提交

Fix a violation of WAL coding rules in the recent patch to include an · 7fc7a7c4

由 Tom Lane 提交于 8月 24, 2009

"all tuples visible" flag in heap page headers. The flag update *must*
be applied before calling XLogInsert, but heap_update and the tuple
moving routines in VACUUM FULL were ignoring this rule. A crash and
replay could therefore leave the flag incorrectly set, causing rows
to appear visible in seqscans when they should not be. This might explain
recent reports of data corruption from Jeff Ross and others.

In passing, do a bit of editorialization on comments in visibilitymap.c.

7fc7a7c4

11 6月, 2009 1 次提交
- B
  8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list · d7471402
  由 Bruce Momjian 提交于 6月 11, 2009
```
provided by Andrew.
```
  d7471402
07 6月, 2009 1 次提交

Improve the IndexVacuumInfo/IndexBulkDeleteResult API to allow somewhat sane · 32ea2363

由 Tom Lane 提交于 6月 06, 2009

behavior in cases where we don't know the heap tuple count accurately; in
particular partial vacuum, but this also makes the API a bit more useful
for ANALYZE. This patch adds "estimated_count" flags to both structs so
that an approximate count can be flagged as such, and adjusts the logic
so that approximate counts are not used for updating pg_class.reltuples.

This fixes my previous complaint that VACUUM was putting ridiculous values
into pg_class.reltuples for indexes. The actual impact of that bug is
limited, because the planner only pays attention to reltuples for an index
if the index is partial; which probably explains why beta testers hadn't
noticed a degradation in plan quality from it. But it needs to be fixed.

The whole thing is a bit messy and should be redesigned in future, because
reltuples now has the potential to drift quite far away from reality when
a long period elapses with no non-partial vacuums. But this is as good as
it's going to get for 8.4.

32ea2363

25 3月, 2009 1 次提交

Implement "fastupdate" support for GIN indexes, in which we try to accumulate · ff301d6e

由 Tom Lane 提交于 3月 24, 2009

multiple index entries in a holding area before adding them to the main index
structure.  This helps because bulk insert is (usually) significantly faster
than retail insert for GIN.

This patch also removes GIN support for amgettuple-style index scans.  The
API defined for amgettuple is difficult to support with fastupdate, and
the previously committed partial-match feature didn't really work with
it either.  We might eventually figure a way to put back amgettuple
support, but it won't happen for 8.4.

catversion bumped because of change in GIN's pg_am entry, and because
the format of GIN indexes changed on-disk (there's a metapage now,
and possibly a pending list).

Teodor Sigaev

ff301d6e

23 1月, 2009 1 次提交

Only skip pages marked as clean in the visibility map, if the last 32 · bf136cf6

由 Heikki Linnakangas 提交于 1月 22, 2009

pages were marked as clean as well. The idea is to avoid defeating OS
readahead by skipping a page here and there, and also makes it less likely
that we miss an opportunity to advance relfrozenxid, for the sake of only
a few skipped pages.

bf136cf6

16 1月, 2009 1 次提交
- H
  Add vacuum_freeze_table_age GUC option, to control when VACUUM should · 65878185
  由 Heikki Linnakangas 提交于 1月 16, 2009
```
ignore the visibility map and scan the whole table, to advance
relfrozenxid.
```
  65878185
06 1月, 2009 1 次提交

Fix logic in lazy vacuum to decide if it's worth trying to truncate the heap. · 7ffe6572

由 Heikki Linnakangas 提交于 1月 06, 2009

If the table was smaller than REL_TRUNCATE_FRACTION (= 16) pages, we always
tried to acquire AccessExclusiveLock on it even if there was no empty pages
at the end.

Report by Simon Riggs. Back-patch all the way to 7.4.

7ffe6572

02 1月, 2009 1 次提交
- B
  
  Update copyright for 2009. · 511db38a
  由 Bruce Momjian 提交于 1月 01, 2009
  
  511db38a
17 12月, 2008 1 次提交

Don't reset pg_class.reltuples and relpages in VACUUM, if any pages were · dcf84099

由 Heikki Linnakangas 提交于 12月 17, 2008

skipped. We could update relpages anyway, but it seems better to only
update it together with reltuples, because we use the reltuples/relpages
ratio in the planner. Also don't update n_live_tuples in pgstat.

ANALYZE in VACUUM ANALYZE now needs to update pg_class, if the
VACUUM-phase didn't do so. Added some boolean-passing to let analyze_rel
know if it should update pg_class or not.

I also moved the relcache invalidation (to update rd_targblock) from
vac_update_relstats to where RelationTruncate is called, because
vac_update_relstats is not called for partial vacuums anymore. It's more
obvious to send the invalidation close to the truncation that requires it.

Per report by Ned T. Crigler.

dcf84099

04 12月, 2008 1 次提交

Utilize the visibility map in autovacuum, too. There was an oversight in · 7537f52a

由 Heikki Linnakangas 提交于 12月 04, 2008

the visibility map patch that because autovacuum always sets
VacuumStmt->freeze_min_age, visibility map was never used for autovacuum,
only for manually launched vacuums. This patch introduces a new scan_all
field to VacuumStmt, indicating explicitly whether the visibility map
should be used, or the whole relation should be scanned, to advance
relfrozenxid. Anti-wraparound vacuums still need to scan all pages.

7537f52a

03 12月, 2008 1 次提交

Introduce visibility map. The visibility map is a bitmap with one bit per · 608195a3

由 Heikki Linnakangas 提交于 12月 03, 2008

heap page, where a set bit indicates that all tuples on the page are
visible to all transactions, and the page therefore doesn't need
vacuuming. It is stored in a new relation fork.

Lazy vacuum uses the visibility map to skip pages that don't need
vacuuming. Vacuum is also responsible for setting the bits in the map.
In the future, this can hopefully be used to implement index-only-scans,
but we can't currently guarantee that the visibility map is always 100%
up-to-date.

In addition to the visibility map, there's a new PD_ALL_VISIBLE flag on
each heap page, also indicating that all tuples on the page are visible to
all transactions. It's important that this flag is kept up-to-date. It
is also used to skip visibility tests in sequential scans, which gives a
small performance gain on seqscans.

608195a3

19 11月, 2008 1 次提交

Rethink the way FSM truncation works. Instead of WAL-logging FSM · 33960006

由 Heikki Linnakangas 提交于 11月 19, 2008

truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To
make that cleaner from modularity point of view, move the WAL-logging one
level up to RelationTruncate, and move RelationTruncate and all the
related WAL-logging to new src/backend/catalog/storage.c file. Introduce
new RelationCreateStorage and RelationDropStorage functions that are used
instead of calling smgrcreate/smgrscheduleunlink directly. Move the
pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new
functions. This leaves smgr.c as a thin wrapper around md.c; all the
transactional stuff is now in storage.c.

This will make it easier to add new forks with similar truncation logic,
like the visibility map.

33960006

10 11月, 2008 1 次提交
- T
  Make relhasrules and relhastriggers work like relhasindex, namely we let · c5451c22
  由 Tom Lane 提交于 11月 10, 2008
```
VACUUM reset them to false rather than trying to clean 'em up during DROP.
```
  c5451c22
31 10月, 2008 1 次提交

Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer · 19c8dc83

由 Heikki Linnakangas 提交于 10月 31, 2008

functions into one ReadBufferExtended function, that takes the strategy
and mode as argument. There's three modes, RBM_NORMAL which is the default
used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and
a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages
without throwing an error. The FSM needs the new mode to recover from
corrupt pages, which could happend if we crash after extending an FSM file,
and the new page is "torn".

Add fork number to some error messages in bufmgr.c, that still lacked it.

19c8dc83

30 9月, 2008 1 次提交

Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the · 15c121b3

由 Heikki Linnakangas 提交于 9月 30, 2008

free space information is stored in a dedicated FSM relation fork, with each
relation (except for hash indexes; they don't use FSM).

This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
trace of them from the backend, initdb, and documentation.

Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
introduce a new variant of the get_raw_page(regclass, int4, int4) function in
contrib/pageinspect that let's you to return pages from any relation fork, and
a new fsm_page_contents() function to inspect the new FSM pages.

15c121b3

12 5月, 2008 1 次提交

Restructure some header files a bit, in particular heapam.h, by removing some · f8c4d7db

由 Alvaro Herrera 提交于 5月 12, 2008

unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.

f8c4d7db

27 3月, 2008 1 次提交

Move the HTSU_Result enum definition into snapshot.h, to avoid including · 73b0300b

由 Alvaro Herrera 提交于 3月 26, 2008

tqual.h into heapam.h.  This makes all inclusion of tqual.h explicit.

I also sorted alphabetically the includes on some source files.

73b0300b

25 3月, 2008 1 次提交

Fix various infelicities that have snuck into usage of errdetail() and · 32b58d02

由 Tom Lane 提交于 3月 24, 2008

friends. Avoid double translation of some messages, ensure other messages
are exposed for translation (and make them follow the style guidelines),
avoid unsafe passing of an unpredictable message text as a format string.

32b58d02

10 3月, 2008 1 次提交

Reduce memory consumption during VACUUM of large relations, by using · 3fcc7e8e

由 Tom Lane 提交于 3月 10, 2008

FSMPageData (6 bytes) instead of PageFreeSpaceInfo (8 or 16 bytes)
for the temporary array of page-free-space information.

Itagaki Takahiro

3fcc7e8e

02 1月, 2008 1 次提交
- B
  
  Update copyrights in source tree to 2008. · 9098ab9e
  由 Bruce Momjian 提交于 1月 01, 2008
  
  9098ab9e
16 11月, 2007 1 次提交
- B
  
  pgindent run for 8.3. · fdf5a5ef
  由 Bruce Momjian 提交于 11月 15, 2007
  
  fdf5a5ef
27 9月, 2007 1 次提交
- A
  Adjust the new memory limit in the lazy vacuum code to use MaxHeapTuplesPerPage · b83e1163
  由 Alvaro Herrera 提交于 9月 26, 2007
```
tuples per page instead of fixed 200, to better cope with systems that use a
different block size.
```
  b83e1163
24 9月, 2007 2 次提交

Reduce the size of memory allocations by lazy vacuum when processing a small · 58536626

由 Alvaro Herrera 提交于 9月 24, 2007

table, by allocating just enough for a hardcoded number of dead tuples per
page.  The current estimate is 200 dead tuples per page.

Per reports from Jeff Amiel, Erik Jones and Marko Kreen, and subsequent
discussion.
CVS: ----------------------------------------------------------------------
CVS: Enter Log.  Lines beginning with `CVS:' are removed automatically
CVS:
CVS: Committing in .
CVS:
CVS: Modified Files:
CVS: 	commands/vacuumlazy.c
CVS: ----------------------------------------------------------------------

58536626

Simplify and rename some GUC variables, per various recent discussions: · 48f7e643

由 Tom Lane 提交于 9月 24, 2007

* stats_start_collector goes away; we always start the collector process,
unless prevented by a problem with setting up the stats UDP socket.

* stats_reset_on_server_start goes away; it seems useless in view of the
availability of pg_stat_reset().

* stats_block_level and stats_row_level are merged into a single variable
"track_counts", which controls all reports sent to the collector process.

* stats_command_string is renamed to track_activities.

* log_autovacuum is renamed to log_autovacuum_min_duration to better reflect
its meaning.

The log_autovacuum change is not a compatibility issue since it didn't exist
before 8.3 anyway.  The other changes need to be release-noted.

48f7e643

21 9月, 2007 2 次提交

T
Revert ill-fated patch to release exclusive lock early after vacuum · eb5f4d6c
由 Tom Lane 提交于 9月 20, 2007
```
truncates a table.  Introduces race condition, as shown by buildfarm
failures.
```
eb5f4d6c

HOT updates. When we update a tuple without changing any of its indexed · 282d2a03

由 Tom Lane 提交于 9月 20, 2007

columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version. Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.

In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.

Pavan Deolasee, with help from a bunch of other people.

282d2a03

16 9月, 2007 1 次提交

Fix aboriginal mistake in lazy VACUUM's code for truncating away · 43b0c918

由 Tom Lane 提交于 9月 16, 2007

no-longer-needed pages at the end of a table. We thought we could throw away
pages containing HEAPTUPLE_DEAD tuples; but this is not so, because such
tuples very likely have index entries pointing at them, and we wouldn't have
removed the index entries. The problem only emerges in a somewhat unlikely
race condition: the dead tuples have to have been inserted by a transaction
that later aborted, and this has to have happened between VACUUM's initial
scan of the page and then rechecking it for empty in count_nondeletable_pages.
But that timespan will include an index-cleaning pass, so it's not all that
hard to hit. This seems to explain a couple of previously unsolved bug
reports.

43b0c918

13 9月, 2007 1 次提交

Redefine the lp_flags field of item pointers as having four states, rather · 68893035

由 Tom Lane 提交于 9月 12, 2007

than two independent bits (one of which was never used in heap pages anyway,
or at least hadn't been in a very long time). This gives us flexibility to
add the HOT notions of redirected and dead item pointers without requiring
anything so klugy as magic values of lp_off and lp_len. The state values
are chosen so that for the states currently in use (pre-HOT) there is no
change in the physical representation.

68893035

12 9月, 2007 1 次提交
- A
  Add a CHECK_FOR_INTERRUPTS call in the site where the vacuum delay point · 9588e1bd
  由 Alvaro Herrera 提交于 9月 12, 2007
```
was removed.
```
  9588e1bd
11 9月, 2007 2 次提交

A
Release the exclusive lock on the table early after truncating it in lazy · 6a10f0f7
由 Alvaro Herrera 提交于 9月 10, 2007
```
vacuum, instead of waiting till commit.
```
6a10f0f7

Remove the vacuum_delay_point call in count_nondeletable_pages, because we hold · 21c27af6

由 Alvaro Herrera 提交于 9月 10, 2007

an exclusive lock on the table at this point, which we want to release as soon
as possible. This is called in the phase of lazy vacuum where we truncate the
empty pages at the end of the table.

An alternative solution would be to lower the vacuum delay settings before
starting the truncating phase, but this doesn't work very well in autovacuum
due to the autobalancing code (which can cause other processes to change our
cost delay settings). This case could be considered in the balancing code, but
it is simpler this way.

21c27af6