提交 · 3396000684b41e7e9467d1abc67152b39e697035 · Greenplum / Gpdb

19 11月, 2008 1 次提交

Rethink the way FSM truncation works. Instead of WAL-logging FSM · 33960006

由 Heikki Linnakangas 提交于 11月 19, 2008

truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To
make that cleaner from modularity point of view, move the WAL-logging one
level up to RelationTruncate, and move RelationTruncate and all the
related WAL-logging to new src/backend/catalog/storage.c file. Introduce
new RelationCreateStorage and RelationDropStorage functions that are used
instead of calling smgrcreate/smgrscheduleunlink directly. Move the
pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new
functions. This leaves smgr.c as a thin wrapper around md.c; all the
transactional stuff is now in storage.c.

This will make it easier to add new forks with similar truncation logic,
like the visibility map.

33960006

07 11月, 2008 1 次提交

Improve bulk-insert performance by keeping the current target buffer pinned · 85e2cedf

由 Tom Lane 提交于 11月 06, 2008

(but not locked, as that would risk deadlocks).  Also, make it work in a small
ring of buffers to avoid having bulk inserts trash the whole buffer arena.

Robert Haas, after an idea of Simon Riggs'.

85e2cedf

01 11月, 2008 1 次提交

Update FSM on WAL replay. This is a bit limited; the FSM is only updated · e9816533

由 Heikki Linnakangas 提交于 10月 31, 2008

on non-full-page-image WAL records, and quite arbitrarily, only if there's
less than 20% free space on the page after the insert/update (not on HOT
updates, though). The 20% cutoff should avoid most of the overhead, when
replaying a bulk insertion, for example, while ensuring that pages that
are full are marked as full in the FSM.

This is mostly to avoid the nasty worst case scenario, where you replay
from a PITR archive, and the FSM information in the base backup is really
out of date. If there was a lot of pages that the outdated FSM claims to
have free space, but don't actually have any, the first unlucky inserter
after the recovery would traverse through all those pages, just to find
out that they're full. We didn't have this problem with the old FSM
implementation, because we simply threw the FSM information away on a
non-clean shutdown.

e9816533

31 10月, 2008 1 次提交

Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer · 19c8dc83

由 Heikki Linnakangas 提交于 10月 31, 2008

functions into one ReadBufferExtended function, that takes the strategy
and mode as argument. There's three modes, RBM_NORMAL which is the default
used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and
a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages
without throwing an error. The FSM needs the new mode to recover from
corrupt pages, which could happend if we crash after extending an FSM file,
and the new page is "torn".

Add fork number to some error messages in bufmgr.c, that still lacked it.

19c8dc83

28 10月, 2008 1 次提交
- A
  No need for extra code to log freezing zero tuples. Callers already check that · c9d1efda
  由 Alvaro Herrera 提交于 10月 27, 2008
```
they are freezing a nonzero amount anyway.
```
  c9d1efda
08 10月, 2008 1 次提交

Modify the parser's error reporting to include a specific hint for the case · 34372863

由 Tom Lane 提交于 10月 08, 2008

of referencing a WITH item that's not yet in scope according to the SQL
spec's semantics.  This seems to be an easy error to make, and the bare
"relation doesn't exist" message doesn't lead one's mind in the correct
direction to fix it.

34372863

30 9月, 2008 1 次提交

Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the · 15c121b3

由 Heikki Linnakangas 提交于 9月 30, 2008

free space information is stored in a dedicated FSM relation fork, with each
relation (except for hash indexes; they don't use FSM).

This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
trace of them from the backend, initdb, and documentation.

Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
introduce a new variant of the get_raw_page(regclass, int4, int4) function in
contrib/pageinspect that let's you to return pages from any relation fork, and
a new fsm_page_contents() function to inspect the new FSM pages.

15c121b3

11 9月, 2008 1 次提交

Initialize the minimum frozen Xid in vac_update_datfrozenxid using · d53a5668

由 Alvaro Herrera 提交于 9月 11, 2008

GetOldestXmin() instead of RecentGlobalXmin; this is safer because we do not
depend on the latter being correctly set elsewhere, and while it is more
expensive, this code path is not performance-critical. This is a real
risk for autovacuum, because it can execute whole cycles without doing
a single vacuum, which would mean that RecentGlobalXmin would stay at its
initialization value, FirstNormalTransactionId, causing a bogus value to be
inserted in pg_database. This bug could explain some recent reports of
failure to truncate pg_clog.

At the same time, change the initialization of RecentGlobalXmin to
InvalidTransactionId, and ensure that it's set to something else whenever
it's going to be used. Using it as FirstNormalTransactionId in HOT page
pruning could incur in data loss. InitPostgres takes care of setting it
to a valid value, but the extra checks are there to prevent "special"
backends from behaving in unusual ways.

Per Tom Lane's detailed problem dissection in 29544.1221061979@sss.pgh.pa.us

d53a5668

11 8月, 2008 1 次提交

Introduce the concept of relation forks. An smgr relation can now consist · 3f0e808c

由 Heikki Linnakangas 提交于 8月 11, 2008

of multiple forks, and each fork can be created and grown separately.

The bulk of this patch is about changing the smgr API to include an extra
ForkNumber argument in every smgr function. Also, smgrscheduleunlink and
smgrdounlink no longer implicitly call smgrclose, because other forks might
still exist after unlinking one. The callers of those functions have been
modified to call smgrclose instead.

This patch in itself doesn't have any user-visible effect, but provides the
infrastructure needed for upcoming patches. The additional forks envisioned
are a rewritten FSM implementation that doesn't rely on a fixed-size shared
memory block, and a visibility map to allow skipping portions of a table in
VACUUM that have no dead tuples.

3f0e808c

14 7月, 2008 1 次提交

Clean up the use of some page-header-access macros: principally, use · 9d035f42

由 Tom Lane 提交于 7月 13, 2008

SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that
makes the code clearer, and avoid casting between Page and PageHeader where
possible. Zdenek Kotala, with some additional cleanup by Heikki Linnakangas.

I did not apply the parts of the proposed patch that would have resulted in
slightly changing the on-disk format of hash indexes; it seems to me that's
not a win as long as there's any chance of having in-place upgrade for 8.4.

9d035f42

19 6月, 2008 1 次提交

Improve our #include situation by moving pointer types away from the · a3540b0f

由 Alvaro Herrera 提交于 6月 19, 2008

corresponding struct definitions. This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.

a3540b0f

12 6月, 2008 1 次提交

Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relation · a213f1ee

由 Heikki Linnakangas 提交于 6月 12, 2008

forks. XLogOpenRelation() and the associated light-weight relation cache in
xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument,
instead of Relation.

For functions that still need a Relation struct during WAL replay, there's a
new function called CreateFakeRelcacheEntry() that returns a fake entry like
XLogOpenRelation() used to.

a213f1ee

09 6月, 2008 1 次提交

Move BufferGetPageSize and BufferGetPage from bufpage.h to bufmgr.h. It is · cc87402d

由 Alvaro Herrera 提交于 6月 08, 2008

more logical that way, and also it reduces the amount of unnecessary includes
in bufpage.h, which is widely used.

Zdenek Kotala.

My previous patch to bufpage.h should also have credited him as author, but I
forgot (sorry about that).

cc87402d

13 5月, 2008 1 次提交
- A
  Put back bufmgr.h in bufpage.h -- it is needed by some macros. · 90843997
  由 Alvaro Herrera 提交于 5月 12, 2008
```
Remove #include bufmgr.h from (most?) source files which already include
bufpage.h.
```
  90843997
12 5月, 2008 1 次提交

Restructure some header files a bit, in particular heapam.h, by removing some · f8c4d7db

由 Alvaro Herrera 提交于 5月 12, 2008

unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.

f8c4d7db

04 4月, 2008 1 次提交
- T
  Remove heap_release_fetch, which is no longer used anywhere; this simplifies · b0327159
  由 Tom Lane 提交于 4月 03, 2008
```
heap_fetch a little.
```
  b0327159
27 3月, 2008 3 次提交

Move the HTSU_Result enum definition into snapshot.h, to avoid including · 73b0300b

由 Alvaro Herrera 提交于 3月 26, 2008

tqual.h into heapam.h.  This makes all inclusion of tqual.h explicit.

I also sorted alphabetically the includes on some source files.

73b0300b

A
Rename snapmgmt.c/h to snapmgr.c/h, for consistency with other files. · 78f02ca1
由 Alvaro Herrera 提交于 3月 26, 2008
```
Per complaint from Tom Lane.
```
78f02ca1

Separate snapshot management code from tuple visibility code, create a · d43b085d

由 Alvaro Herrera 提交于 3月 26, 2008

snapmgmt.c file for the former. The header files have also been reorganized
in three parts: the most basic snapshot definitions are now in a new file
snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c.
tqual.h has been reduced to the bare minimum.

This patch is just a first step towards managing live snapshots within a
transaction; there is no functionality change.

Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and
subsequent discussion.

d43b085d

09 3月, 2008 1 次提交

Refactor heap_page_prune so that instead of changing item states on-the-fly, · 6f10eb21

由 Tom Lane 提交于 3月 08, 2008

it accumulates the set of changes to be made and then applies them. It had
to accumulate the set of changes anyway to prepare a WAL record for the
pruning action, so this isn't an enormous change; the only new complexity is
to not doubly mark tuples that are visited twice in the scan. The main
advantage is that we can substantially reduce the scope of the critical
section in which the changes are applied, thus avoiding PANIC in foreseeable
cases like running out of memory in inval.c. A nice secondary advantage is
that it is now far clearer that WAL replay will actually do the same thing
that the original pruning did.

This commit doesn't do anything about the open problem that
CacheInvalidateHeapTuple doesn't have the right semantics for a CTID change
caused by collapsing out a redirect pointer. But whatever we do about that,
it'll be a good idea to not do it inside a critical section.

6f10eb21

05 3月, 2008 1 次提交

Fix PREPARE TRANSACTION to reject the case where the transaction has dropped a · 7d6e6e2e

由 Tom Lane 提交于 3月 04, 2008

temporary table; we can't support that because there's no way to clean up the
source backend's internal state if the eventual COMMIT PREPARED is done by
another backend. This was checked correctly in 8.1 but I broke it in 8.2 :-(.
Patch by Heikki Linnakangas, original trouble report by John Smith.

7d6e6e2e

31 1月, 2008 1 次提交

Add a GUC variable "synchronize_seqscans" to allow clients to disable the new · 47df4f66

由 Tom Lane 提交于 1月 30, 2008

synchronized-scanning behavior, and make pg_dump disable sync scans so that
it will reliably preserve row ordering.  Per recent discussions.

47df4f66

14 1月, 2008 1 次提交

Fix CREATE INDEX CONCURRENTLY so that it won't use synchronized scan for · d3b1b1f9

由 Tom Lane 提交于 1月 14, 2008

its second pass over the table. It has to start at block zero, else the
"merge join" logic for detecting which TIDs are already in the index
doesn't work. Hence, extend heapam.c's API so that callers can enable or
disable syncscan. (I put in an option to disable buffer access strategy,
too, just in case somebody needs it.) Per report from Hannes Dorbath.

d3b1b1f9

02 1月, 2008 1 次提交
- B
  
  Update copyrights in source tree to 2008. · 9098ab9e
  由 Bruce Momjian 提交于 1月 01, 2008
  
  9098ab9e
01 12月, 2007 1 次提交

Avoid incrementing the CommandCounter when CommandCounterIncrement is called · 895a94de

由 Tom Lane 提交于 11月 30, 2007

but no database changes have been made since the last CommandCounterIncrement.
This should result in a significant improvement in the number of "commands"
that can typically be performed within a transaction before hitting the 2^32
CommandId size limit. In particular this buys back (and more) the possible
adverse consequences of my previous patch to fix plan caching behavior.

The implementation requires tracking whether the current CommandCounter
value has been "used" to mark any tuples. CommandCounter values stored into
snapshots are presumed not to be used for this purpose. This requires some
small executor changes, since the executor used to conflate the curcid of
the snapshot it was using with the command ID to mark output tuples with.
Separating these concepts allows some small simplifications in executor APIs.

Something for the TODO list: look into having CommandCounterIncrement not do
AcceptInvalidationMessages. It seems fairly bogus to be doing it there,
but exactly where to do it instead isn't clear, and I'm disinclined to mess
with asynchronous behavior during late beta.

895a94de

16 11月, 2007 1 次提交
- B
  
  pgindent run for 8.3. · fdf5a5ef
  由 Bruce Momjian 提交于 11月 15, 2007
  
  fdf5a5ef
07 11月, 2007 1 次提交
- P
  
  Use "alternative" instead of "alternate" where it is clearer. · 5f9869d0
  由 Peter Eisentraut 提交于 11月 07, 2007
  
  5f9869d0
17 10月, 2007 1 次提交

由 Tom Lane 提交于 10月 16, 2007

when relkind = RELKIND_RELATION. This syncs these tests with the Asserts
in tuptoaster.c, and ensures that we won't ever try to, for example,
compress a sequence's tuple. Problem found by Greg Stark while stress-testing
with much-smaller-than-normal page sizes.

56303abf

22 9月, 2007 1 次提交

Improve handling of prune/no-prune decisions by storing a page's oldest · cc59049d

由 Tom Lane 提交于 9月 21, 2007

unpruned XMAX in its header. At the cost of 4 bytes per page, this keeps us
from performing heap_page_prune when there's no chance of pruning anything.
Seems to be necessary per Heikki's preliminary performance testing.

cc59049d

21 9月, 2007 1 次提交

HOT updates. When we update a tuple without changing any of its indexed · 282d2a03

由 Tom Lane 提交于 9月 20, 2007

columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version. Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.

In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.

Pavan Deolasee, with help from a bunch of other people.

282d2a03

13 9月, 2007 1 次提交

Redefine the lp_flags field of item pointers as having four states, rather · 68893035

由 Tom Lane 提交于 9月 12, 2007

than two independent bits (one of which was never used in heap pages anyway,
or at least hadn't been in a very long time). This gives us flexibility to
add the HOT notions of redirected and dead item pointers without requiring
anything so klugy as magic values of lp_off and lp_len. The state values
are chosen so that for the states currently in use (pre-HOT) there is no
change in the physical representation.

68893035

08 9月, 2007 1 次提交

Don't take ProcArrayLock while exiting a transaction that has no XID; there is · 0a51e707

由 Tom Lane 提交于 9月 07, 2007

no need for serialization against snapshot-taking because the xact doesn't
affect anyone else's snapshot anyway. Per discussion. Also, move various
info about the interlocking of transactions and snapshots out of code comments
and into a hopefully-more-cohesive discussion in access/transam/README.

Also, remove a couple of now-obsolete comments about having to force some WAL
to be written to persuade RecordTransactionCommit to do its thing.

0a51e707

06 9月, 2007 1 次提交

Implement lazy XID allocation: transactions that do not modify any database · 295e6398

由 Tom Lane 提交于 9月 05, 2007

rows will normally never obtain an XID at all. We already did things this way
for subtransactions, but this patch extends the concept to top-level
transactions. In applications where there are lots of short read-only
transactions, this should improve performance noticeably; not so much from
removal of the actual XID-assignments, as from reduction of overhead that's
driven by the rate of XID consumption. We add a concept of a "virtual
transaction ID" so that active transactions can be uniquely identified even
if they don't have a regular XID. This is a much lighter-weight concept:
uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
record is made about them.

Florian Pflug, with some editorialization by Tom.

295e6398

15 8月, 2007 1 次提交

Fix oversight in async-commit patch: there were some places in heapam.c · 67f99d21

由 Tom Lane 提交于 8月 14, 2007

that still thought they could set HEAP_XMAX_COMMITTED immediately after
seeing the other transaction commit. Make them use the same logic as
tqual.c does to determine if the hint bit can be set yet.

67f99d21

10 6月, 2007 1 次提交

Teach heapam code to know the difference between a real seqscan and the · 85d72f05

由 Tom Lane 提交于 6月 09, 2007

pseudo HeapScanDesc created for a bitmap heap scan. This avoids some useless
overhead during a bitmap scan startup, in particular invoking the syncscan
code. (We might someday want to do that, but right now it's merely useless
contention for shared memory, to say nothing of possibly pushing useful
entries out of syncscan's small LRU list.) This also allows elimination of
ugly pgstat_discount_heap_scan() kluge.

85d72f05

09 6月, 2007 1 次提交

Arrange for large sequential scans to synchronize with each other, so that · a04a4235

由 Tom Lane 提交于 6月 08, 2007

when multiple backends are scanning the same relation concurrently, each page
is (ideally) read only once.

Jeff Davis, with review by Heikki and Tom.

a04a4235

31 5月, 2007 1 次提交

Make large sequential scans and VACUUMs work in a limited-size "ring" of · d526575f

由 Tom Lane 提交于 5月 30, 2007

buffers, rather than blowing out the whole shared-buffer arena. Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer. Those flushes will now occur only once per
ring-ful. The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done. The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.

This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it. To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.

Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.

Original patch by Simon, reworked by Heikki and again by Tom.

d526575f

27 5月, 2007 1 次提交

Fix up pgstats counting of live and dead tuples to recognize that committed · 77947c51

由 Tom Lane 提交于 5月 27, 2007

and aborted transactions have different effects; also teach it not to assume
that prepared transactions are always committed.

Along the way, simplify the pgstats API by tying counting directly to
Relations; I cannot detect any redeeming social value in having stats
pointers in HeapScanDesc and IndexScanDesc structures. And fix a few
corner cases in which counts might be missed because the relation's
pgstat_info pointer hadn't been set.

77947c51

08 4月, 2007 1 次提交
- T
  
  Make CLUSTER MVCC-safe. Heikki Linnakangas · 7b78474d
  由 Tom Lane 提交于 4月 08, 2007
  
  7b78474d
03 4月, 2007 1 次提交

Decouple the values of TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE. · b3005276

由 Tom Lane 提交于 4月 03, 2007

Add the latter to the values checked in pg_control, since it can't be changed
without invalidating toast table content. This commit in itself shouldn't
change any behavior, but it lays some necessary groundwork for experimentation
with these toast-control numbers.

Note: while TOAST_TUPLE_THRESHOLD can now be changed without initdb, some
thought still needs to be given to needs_toast_table() in toasting.c before
unleashing random changes.

b3005276