提交 · acac68b2bcae818bc8803b8cb8cbb17eee8d5e2b · Greenplum / Gpdb

27 10月, 2007 1 次提交
- A
  Allow an autovacuum worker to be interrupted automatically when it is found · acac68b2
  由 Alvaro Herrera 提交于 10月 26, 2007
```
to be locking another process (except when it's working to prevent Xid
wraparound problems).
```
  acac68b2
25 10月, 2007 1 次提交

Rearrange vacuum-related bits in PGPROC as a bitmask, to better support · 745c1b2c

由 Alvaro Herrera 提交于 10月 24, 2007

having several of them.  Add two more flags: whether the process is
executing an ANALYZE, and whether a vacuum is for Xid wraparound (which
is obviously only set by autovacuum).

Sneakily move the worker's recently-acquired PostAuthDelay to a more useful
place.

745c1b2c

26 9月, 2007 2 次提交

Dept. of second thoughts: fix loop in BgBufferSync so that the exit when · 7a315a09

由 Tom Lane 提交于 9月 25, 2007

bgwriter_lru_maxpages is exceeded leaves the loop variables in the
expected state. In the original coding, we'd fail to advance
next_to_clean, causing that buffer to be probably-uselessly rechecked next
time, and also have an off-by-one idea of the number of buffers scanned.

7a315a09

Just-in-time background writing strategy. This code avoids re-scanning · 6f5c38dc

由 Tom Lane 提交于 9月 25, 2007

buffers that cannot possibly need to be cleaned, and estimates how many
buffers it should try to clean based on moving averages of recent allocation
requests and density of reusable buffers. The patch also adds a couple
more columns to pg_stat_bgwriter to help measure the effectiveness of the
bgwriter.

Greg Smith, building on his own work and ideas from several other people,
in particular a much older patch from Itagaki Takahiro.

6f5c38dc

24 9月, 2007 1 次提交

TransactionIdIsInProgress can skip scanning the ProcArray if the target XID is · 1b3d400c

由 Tom Lane 提交于 9月 23, 2007

later than latestCompletedXid, per Florian Pflug. Also some minor
improvements in the XIDCACHE_DEBUG code --- make sure each call of
TransactionIdIsInProgress is counted one way or another.

1b3d400c

22 9月, 2007 2 次提交

Improve handling of prune/no-prune decisions by storing a page's oldest · cc59049d

由 Tom Lane 提交于 9月 21, 2007

unpruned XMAX in its header. At the cost of 4 bytes per page, this keeps us
from performing heap_page_prune when there's no chance of pruning anything.
Seems to be necessary per Heikki's preliminary performance testing.

cc59049d

Make some simple performance improvements in TransactionIdIsInProgress(). · da072ab2

由 Tom Lane 提交于 9月 21, 2007

For XIDs of our own transaction and subtransactions, it's cheaper to ask
TransactionIdIsCurrentTransactionId() than to look in shared memory.
Also, the xids[] work array is always the same size within any given
process, so malloc it just once instead of doing a palloc/pfree on every
call; aside from being faster this lets us get rid of some goto's, since
we no longer have any end-of-function pfree to do. Both ideas by Heikki.

da072ab2

21 9月, 2007 1 次提交

HOT updates. When we update a tuple without changing any of its indexed · 282d2a03

由 Tom Lane 提交于 9月 20, 2007

columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version. Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.

In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.

Pavan Deolasee, with help from a bunch of other people.

282d2a03

13 9月, 2007 1 次提交

Redefine the lp_flags field of item pointers as having four states, rather · 68893035

由 Tom Lane 提交于 9月 12, 2007

than two independent bits (one of which was never used in heap pages anyway,
or at least hadn't been in a very long time). This gives us flexibility to
add the HOT notions of redirected and dead item pointers without requiring
anything so klugy as magic values of lp_off and lp_len. The state values
are chosen so that for the states currently in use (pre-HOT) there is no
change in the physical representation.

68893035

09 9月, 2007 1 次提交

Replace the former method of determining snapshot xmax --- to wit, calling · 6bd4f401

由 Tom Lane 提交于 9月 08, 2007

ReadNewTransactionId from GetSnapshotData --- with a "latestCompletedXid"
variable that is updated during transaction commit or abort. Since
latestCompletedXid is written only in places that had to lock ProcArrayLock
exclusively anyway, and is read only in places that had to lock ProcArrayLock
shared anyway, it adds no new locking requirements to the system despite being
cluster-wide. Moreover, removing ReadNewTransactionId from snapshot
acquisition eliminates the need to take both XidGenLock and ProcArrayLock at
the same time. Since XidGenLock is sometimes held across I/O this can be a
significant win. Some preliminary benchmarking suggested that this patch has
no effect on average throughput but can significantly improve the worst-case
transaction times seen in pgbench. Concept by Florian Pflug, implementation
by Tom Lane.

6bd4f401

08 9月, 2007 1 次提交

Don't take ProcArrayLock while exiting a transaction that has no XID; there is · 0a51e707

由 Tom Lane 提交于 9月 07, 2007

no need for serialization against snapshot-taking because the xact doesn't
affect anyone else's snapshot anyway. Per discussion. Also, move various
info about the interlocking of transactions and snapshots out of code comments
and into a hopefully-more-cohesive discussion in access/transam/README.

Also, remove a couple of now-obsolete comments about having to force some WAL
to be written to persuade RecordTransactionCommit to do its thing.

0a51e707

07 9月, 2007 1 次提交
- T
  Allow CREATE INDEX CONCURRENTLY to disregard transactions in other · cd1aae58
  由 Tom Lane 提交于 9月 07, 2007
```
databases, per gripe from hubert depesz lubaczewski.  Patch from
Simon Riggs.
```
  cd1aae58
06 9月, 2007 2 次提交

Volatile-qualify the ProcArray PGPROC pointer in a bunch of routines · 0ecb4ea7

由 Tom Lane 提交于 9月 05, 2007

that examine fields that could change under them. This is just to make
really sure that when we are fetching a value 'only once', that's what
actually happens. Possibly this is a bug that should be back-patched,
but in the absence of solid evidence that it's needed, I won't bother.

0ecb4ea7

Implement lazy XID allocation: transactions that do not modify any database · 295e6398

由 Tom Lane 提交于 9月 05, 2007

rows will normally never obtain an XID at all. We already did things this way
for subtransactions, but this patch extends the concept to top-level
transactions. In applications where there are lots of short read-only
transactions, this should improve performance noticeably; not so much from
removal of the actual XID-assignments, as from reduction of overhead that's
driven by the rate of XID consumption. We add a concept of a "virtual
transaction ID" so that active transactions can be uniquely identified even
if they don't have a regular XID. This is a much lighter-weight concept:
uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
record is made about them.

Florian Pflug, with some editorialization by Tom.

295e6398

28 8月, 2007 1 次提交

Improve behavior of log_lock_waits patch. Ensure that something gets logged · 24d4517b

由 Tom Lane 提交于 8月 28, 2007

even if the "deadlock detected" ERROR message is suppressed by an exception
catcher. Be clearer about the event sequence when a soft deadlock is fixed:
the fixing process might or might not still have to wait, so log that
separately. Fix race condition when someone releases us from the lock partway
through printing all this junk --- we'd not get confused about our state, but
the log message sequence could have been misleading, ie, a "still waiting"
message with no subsequent "acquired" message. Greg Stark and Tom Lane.

24d4517b

26 7月, 2007 3 次提交

Remove FileUnlink(), which wasn't being used anywhere and interacted poorly · e4f4a7f5

由 Tom Lane 提交于 7月 26, 2007

with the recent patch to log temp file sizes at removal time.  Doesn't seem
worth fixing since it's unused.
In passing, make a few elog messages conform to the message style guide.

e4f4a7f5

Arrange to put TOAST tables belonging to temporary tables into special schemas · 82eed4db

由 Tom Lane 提交于 7月 25, 2007

named pg_toast_temp_nnn, alongside the pg_temp_nnn schemas used for the temp
tables themselves. This allows low-level code such as the relcache to
recognize that these tables are indeed temporary, which enables various
optimizations such as not WAL-logging changes and using local rather than
shared buffers for access. Aside from obvious performance benefits, this
provides a solution to bug #3483, in which other backends unexpectedly held
open file references to temporary tables. The scheme preserves the property
that TOAST tables are not in any schema that's normally in the search path,
so they don't conflict with user table names.

initdb forced because of changes in system view definitions.

82eed4db

T
Suppress warning when compiling with -DPROFILE_PID_DIR: sys/stat.h is · fdb5b69e
由 Tom Lane 提交于 7月 25, 2007
```
supposed to be included when using mkdir().
```
fdb5b69e

21 7月, 2007 1 次提交

Fix WAL replay of truncate operations to cope with the possibility that the · 04fbe29a

由 Tom Lane 提交于 7月 20, 2007

truncated relation was deleted later in the WAL sequence. Since replay
normally auto-creates a relation upon its first reference by a WAL log entry,
failure is seen only if the truncate entry happens to be the first reference
after the checkpoint we're restarting from; which is a pretty unusual case but
of course not impossible. Fix by making truncate entries auto-create like
the other ones do. Per report and test case from Dharmendra Goyal.

04fbe29a

17 7月, 2007 1 次提交
- T
  Add comments spelling out why it's a good idea to release multiple · 82b36846
  由 Tom Lane 提交于 7月 16, 2007
```
partition locks in reverse order.
```
  82b36846
09 7月, 2007 1 次提交

Remove the pgstat_drop_relation() call from smgr_internal_unlink(), because · b09cb0cf

由 Tom Lane 提交于 7月 08, 2007

we don't know at that point which relation OID to tell pgstat to forget.
The code was passing the relfilenode, which is incorrect, and could possibly
cause some other relation's stats to be zeroed out. While we could try to
clean this up, it seems much simpler and more reliable to let the next
invocation of pgstat_vacuum_tabstat() fix things; which indeed is how it
worked before I introduced the buggy code into 8.1.3 and later :-(.
Problem noticed by Itagaki Takahiro, fix is per subsequent discussion.

b09cb0cf

03 7月, 2007 1 次提交

Fix incorrect comment about the timing of AbsorbFsyncRequests() during · 83aaebba

由 Tom Lane 提交于 7月 03, 2007

checkpoint. The comment claimed that we could do this anytime after
setting the checkpoint REDO point, but actually BufferSync is relying
on the assumption that buffers dumped by other backends will be fsync'd
too. So we really could not do it any sooner than we are doing it.

83aaebba

01 7月, 2007 2 次提交
- T
  
  Fix comments not updated in recent patch. · beba7376
  由 Tom Lane 提交于 7月 01, 2007
  
  beba7376
- T
  Improve logging of checkpoints. Patch by Greg Smith, worked over · 9fc25c05
  由 Tom Lane 提交于 6月 30, 2007
```
by Heikki and a little bit by me.
```
  9fc25c05
30 6月, 2007 1 次提交

Arrange for SIGINT in autovacuum workers to cancel the current table and · 10af02b9

由 Alvaro Herrera 提交于 6月 29, 2007

continue with the schedule. Change current uses of SIGINT to abort a worker
into SIGTERM, which keeps the old behaviour of terminating the process.

Patch from ITAGAKI Takahiro, with some editorializing of my own.

10af02b9

28 6月, 2007 1 次提交

Implement "distributed" checkpoints in which the checkpoint I/O is spread · 867e2c91

由 Tom Lane 提交于 6月 28, 2007

over a fairly long period of time, rather than being spat out in a burst.
This happens only for background checkpoints carried out by the bgwriter;
other cases, such as a shutdown checkpoint, are still done at full speed.

Remove the "all buffers" scan in the bgwriter, and associated stats
infrastructure, since this seems no longer very useful when the checkpoint
itself is properly throttled.

Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas,
and some minor API editorialization by me.

867e2c91

20 6月, 2007 2 次提交

Only log 'process acquired lock' if we actually did get the lock. This · 9cce91db

由 Tom Lane 提交于 6月 19, 2007

test seems inessential right now since the only control path for not
getting the lock is via CHECK_FOR_INTERRUPTS which won't return control
to ProcSleep, but it would be important if we ever allow the deadlock
code to kill someone else's transaction instead of our own.

9cce91db

Code review for log_lock_waits patch. Don't try to issue log messages from · 6e072287

由 Tom Lane 提交于 6月 19, 2007

within a signal handler (this might be safe given the relatively narrow code
range in which the interrupt is enabled, but it seems awfully risky); do issue
more informative log messages that tell what is being waited for and the exact
length of the wait; minor other code cleanup. Greg Stark and Tom Lane

6e072287

18 6月, 2007 1 次提交
- T
  Update obsolete comment: it's no longer the case that mdread() will allow · de6a6383
  由 Tom Lane 提交于 6月 18, 2007
```
reads beyond EOF, except by special coercion.
```
  de6a6383
13 6月, 2007 1 次提交

Add some simple defenses against null fields in pg_largeobject, and add · e976fd43

由 Tom Lane 提交于 6月 12, 2007

comments noting that there's an alignment assumption now that the data
field could be in 1-byte-header format.  Per discussion with Greg Stark.

e976fd43

09 6月, 2007 1 次提交

Arrange for large sequential scans to synchronize with each other, so that · a04a4235

由 Tom Lane 提交于 6月 08, 2007

when multiple backends are scanning the same relation concurrently, each page
is (ideally) read only once.

Jeff Davis, with review by Heikki and Tom.

a04a4235

08 6月, 2007 2 次提交

Redefine IsTransactionState() to only return true for TRANS_INPROGRESS state, · 6d6d14b6

由 Tom Lane 提交于 6月 07, 2007

which is the only state in which it's safe to initiate database queries.
It turns out that all but two of the callers thought that's what it meant;
and the other two were using it as a proxy for "will GetTopTransactionId()
return a nonzero XID"? Since it was in fact an unreliable guide to that,
make those two just invoke GetTopTransactionId() always, then deal with a
zero result if they get one.

6d6d14b6

Rework temp_tablespaces patch so that temp tablespaces are assigned separately · 24ee8af5

由 Tom Lane 提交于 6月 07, 2007

for each temp file, rather than once per sort or hashjoin; this allows
spreading the data of a large sort or join across multiple tablespaces.
(I remain dubious that this will make any difference in practice, but certain
people insisted.)  Arrange to cache the results of parsing the GUC variable
instead of recomputing from scratch on every demand, and push usage of the
cache down to the bottommost fd.c level.

24ee8af5

04 6月, 2007 1 次提交

Create a GUC parameter temp_tablespaces that allows selection of the · acfce502

由 Tom Lane 提交于 6月 03, 2007

tablespace(s) in which to store temp tables and temporary files. This is a
list to allow spreading the load across multiple tablespaces (a random list
element is chosen each time a temp object is to be created). Temp files are
not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace
directories.

Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.

acfce502

02 6月, 2007 2 次提交

Fix aboriginal bug in BufFileDumpBuffer that would cause it to write the · 964ec46c

由 Tom Lane 提交于 6月 01, 2007

wrong data when dumping a bufferload that crosses a component-file boundary.
This probably has not been seen in the wild because (a) component files are
normally 1GB apiece and (b) non-block-aligned buffer usage is relatively
rare. But it's fairly easy to reproduce a problem if one reduces RELSEG_SIZE
in a test build. Kudos to Kurt Harriman for spotting the bug.

964ec46c

Make CREATE/DROP/RENAME DATABASE wait a little bit to see if other backends · bd0a2609

由 Tom Lane 提交于 6月 01, 2007

will exit before failing because of conflicting DB usage. Per discussion,
this seems a good idea to help mask the fact that backend exit takes nonzero
time. Remove a couple of thereby-obsoleted sleeps in contrib and PL
regression test sequences.

bd0a2609

31 5月, 2007 1 次提交

Make large sequential scans and VACUUMs work in a limited-size "ring" of · d526575f

由 Tom Lane 提交于 5月 30, 2007

buffers, rather than blowing out the whole shared-buffer arena. Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer. Those flushes will now occur only once per
ring-ful. The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done. The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.

This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it. To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.

Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.

Original patch by Simon, reworked by Heikki and again by Tom.

d526575f

27 5月, 2007 1 次提交

Fix up pgstats counting of live and dead tuples to recognize that committed · 77947c51

由 Tom Lane 提交于 5月 27, 2007

and aborted transactions have different effects; also teach it not to assume
that prepared transactions are always committed.

Along the way, simplify the pgstats API by tying counting directly to
Relations; I cannot detect any redeeming social value in having stats
pointers in HeapScanDesc and IndexScanDesc structures. And fix a few
corner cases in which counts might be missed because the relation's
pgstat_info pointer hadn't been set.

77947c51

03 5月, 2007 2 次提交

Dept. of second thoughts: add comments cautioning against using · 63735ca8

由 Tom Lane 提交于 5月 02, 2007

ReadOrZeroBuffer to fetch pages from beyond physical EOF.  This would
usually work, but would cause problems for md.c if writes occurred
beyond a segment boundary when the previous segment file hadn't been
fully extended.

63735ca8

During WAL recovery, when reading a page that we intend to overwrite completely · 8c3cc86e

由 Tom Lane 提交于 5月 02, 2007

from the WAL data, don't bother to physically read it; just have bufmgr.c
return a zeroed-out buffer instead. This speeds recovery significantly,
and also avoids unnecessary failures when a page-to-be-overwritten has corrupt
page headers on disk. This replaces a former kluge that accomplished the
latter by pretending zero_damaged_pages was always ON during WAL recovery;
which was OK when the kluge was put in, but is unsafe when restoring a WAL
log that was written with full_page_writes off.

Heikki Linnakangas

8c3cc86e