提交 · bd0a260928971feec484a22f0b86e5d1936c974f · Greenplum / Gpdb

02 6月, 2007 2 次提交

Make CREATE/DROP/RENAME DATABASE wait a little bit to see if other backends · bd0a2609

由 Tom Lane 提交于 6月 01, 2007

will exit before failing because of conflicting DB usage. Per discussion,
this seems a good idea to help mask the fact that backend exit takes nonzero
time. Remove a couple of thereby-obsoleted sleeps in contrib and PL
regression test sequences.

bd0a2609

Buy back some of the cycles spent in more-expensive hash functions by · bd2c980b

由 Tom Lane 提交于 6月 01, 2007

selecting power-of-2, rather than prime, numbers of buckets in hash joins.
If the hash functions are doing their jobs properly by making all hash bits
equally random, this is good enough, and it saves expensive integer division
and modulus operations.

bd2c980b

01 6月, 2007 2 次提交

Fix several hash functions that were taking chintzy shortcuts instead of · 1f559b7d

由 Tom Lane 提交于 6月 01, 2007

delivering a well-randomized hash value. I got religion on this after
observing that performance of multi-batch hash join degrades terribly if the
higher-order bits of hash values aren't random, as indeed was true for say
hashes of small integer values. It's now expected and documented that hash
functions should use hash_any or some comparable method to ensure that all
bits of their output are about equally random.

initdb forced because this change invalidates existing hash indexes. For the
same reason, this isn't back-patchable; the hash join performance problem
will get a band-aid fix in the back branches.

1f559b7d

Change build_index_pathkeys() so that the expressions it builds to represent · 10f719af

由 Tom Lane 提交于 5月 31, 2007

index key columns always have the type expected by the index's associated
operators, ie, we add RelabelType nodes when dealing with binary-compatible
index opclasses. This is needed to get varchar indexes to play nicely with
the new EquivalenceClass machinery, as per recent gripe from Josh Berkus that
CVS HEAD was failing to match a varchar index column to a constant restriction
in the query.

It seems likely that this change will allow removal of a lot of ugly ad-hoc
RelabelType-stripping that the planner has traditionally done while matching
expressions to other expressions, but I'll worry about that some other day.

10f719af

31 5月, 2007 2 次提交

Make large sequential scans and VACUUMs work in a limited-size "ring" of · d526575f

由 Tom Lane 提交于 5月 30, 2007

buffers, rather than blowing out the whole shared-buffer arena. Aside from
avoiding cache spoliation, this fixes the problem that VACUUM formerly tended
to cause a WAL flush for every page it modified, because we had it hacked to
use only a single buffer. Those flushes will now occur only once per
ring-ful. The exact ring size, and the threshold for seqscans to switch into
the ring usage pattern, remain under debate; but the infrastructure seems
done. The key bit of infrastructure is a new optional BufferAccessStrategy
object that can be passed to ReadBuffer operations; this replaces the former
StrategyHintVacuum API.

This patch also changes the buffer usage-count methodology a bit: we now
advance usage_count when first pinning a buffer, rather than when last
unpinning it. To preserve the behavior that a buffer's lifetime starts to
decrease when it's released, the clock sweep code is modified to not decrement
usage_count of pinned buffers.

Work not done in this commit: teach GiST and GIN indexes to use the vacuum
BufferAccessStrategy for vacuum-driven fetches.

Original patch by Simon, reworked by Heikki and again by Tom.

d526575f

T

Fix trivial misspelling in comment. · 14c4d3de
由 Tom Lane 提交于 5月 30, 2007

14c4d3de

29 5月, 2007 1 次提交

Fix a bug in input processing for the "interval" type. Previously, · 6af04882

由 Neil Conway 提交于 5月 29, 2007

"microsecond" and "millisecond" units were not considered valid input
by themselves, which caused inputs like "1 millisecond" to be rejected
erroneously.

Update the docs, add regression tests, and backport to 8.2 and 8.1

6af04882

28 5月, 2007 1 次提交

Ooops, I was too busy worrying about getting the transactional infrastructure · 97d12b43

由 Tom Lane 提交于 5月 27, 2007

right to think carefully about how insert and delete counts map to
n_live_tuples.  Of course a deletion should reduce n_live_tuples.

97d12b43

27 5月, 2007 2 次提交

pgstat's on-proc-exit hook has to execute after the last transaction commit · 8d675c85

由 Tom Lane 提交于 5月 27, 2007

or abort within a backend; rearrange InitPostgres processing to make it so.
Revealed by just-added Asserts along with ECPG regression tests (hm, I wonder
why the core regression tests didn't expose it?). This possibly is another
reason for missing stats updates ...

8d675c85

Fix up pgstats counting of live and dead tuples to recognize that committed · 77947c51

由 Tom Lane 提交于 5月 27, 2007

and aborted transactions have different effects; also teach it not to assume
that prepared transactions are always committed.

Along the way, simplify the pgstats API by tying counting directly to
Relations; I cannot detect any redeeming social value in having stats
pointers in HeapScanDesc and IndexScanDesc structures. And fix a few
corner cases in which counts might be missed because the relation's
pgstat_info pointer hadn't been set.

77947c51

26 5月, 2007 1 次提交

Create hooks to let a loadable plugin monitor (or even replace) the planner · 604ffd28

由 Tom Lane 提交于 5月 25, 2007

and/or create plans for hypothetical situations; in particular, investigate
plans that would be generated using hypothetical indexes. This is a
heavily-rewritten version of the hooks proposed by Gurjeet Singh for his
Index Advisor project. In this formulation, the index advisor can be
entirely a loadable module instead of requiring a significant part to be
in the core backend, and plans can be generated for hypothetical indexes
without requiring the creation and rolling-back of system catalog entries.

The index advisor patch as-submitted is not compatible with these hooks,
but it needs significant work anyway due to other 8.2-to-8.3 planner
changes. With these hooks in the core backend, development of the advisor
can proceed as a pgfoundry project.

604ffd28

23 5月, 2007 1 次提交

Repair planner bug introduced in 8.2 by ability to rearrange outer joins: · 11086f2f

由 Tom Lane 提交于 5月 22, 2007

in cases where a sub-SELECT inserts a WHERE clause between two outer joins,
that clause may prevent us from re-ordering the two outer joins. The code
was considering only the joins' own ON-conditions in determining reordering
safety, which is not good enough. Add a "delay_upper_joins" flag to
OuterJoinInfo to flag that we have detected such a clause and higher-level
outer joins shouldn't be permitted to commute with this one. (This might
seem overly coarse, but given the current rules for OJ reordering, it's
sufficient AFAICT.)

The failure case is actually pretty narrow: it needs a WHERE clause within
the RHS of a left join that checks the RHS of a lower left join, but is not
strict for that RHS (else we'd have simplified the lower join to a plain
join). Even then no failure will be manifest unless the planner chooses to
rearrange the join order.

Per bug report from Adam Terrey.

11086f2f

22 5月, 2007 3 次提交

Fix best_inner_indexscan to return both the cheapest-total-cost and · d7153c5f

由 Tom Lane 提交于 5月 22, 2007

cheapest-startup-cost innerjoin indexscans, and make joinpath.c consider
both of these (when different) as the inside of a nestloop join. The
original design was based on the assumption that indexscan paths always
have negligible startup cost, and so total cost is the only important
figure of merit; an assumption that's obviously broken by bitmap
indexscans. This oversight could lead to choosing poor plans in cases
where fast-start behavior is more important than total cost, such as
LIMIT and IN queries. 8.1-vintage brain fade exposed by an example from
Chuck D.

d7153c5f

Teach tuplestore.c to throw away data before the "mark" point when the caller · 2415ad98

由 Tom Lane 提交于 5月 21, 2007

is using mark/restore but not rewind or backward-scan capability. Insert a
materialize plan node between a mergejoin and its inner child if the inner
child is a sort that is expected to spill to disk. The materialize shields
the sort from the need to do mark/restore and thereby allows it to perform
its final merge pass on-the-fly; while the materialize itself is normally
cheap since it won't spill to disk unless the number of tuples with equal
key values exceeds work_mem.

Greg Stark, with some kibitzing from Tom Lane.

2415ad98

XPath fixes: · 3963574d

由 Peter Eisentraut 提交于 5月 21, 2007

 - Function renamed to "xpath".
 - Function is now strict, per discussion.
 - Return empty array in case when XPath expression detects nothing
   (previously, NULL was returned in such case), per discussion.
 - (bugfix) Work with fragments with prologue: select xpath('/a',
   '<?xml version="1.0"?><a /><b />'); // now XML datum is always wrapped
   with dummy <x>...</x>, XML prologue simply goes away (if any).
 - Some cleanup.

Nikolay Samokhvalov

Some code cleanup and documentation work by myself.

3963574d

21 5月, 2007 1 次提交

To support external compression of archived WAL data, add a flag bit to · a8d539f1

由 Tom Lane 提交于 5月 20, 2007

WAL records that shows whether it is safe to remove full-page images
(ie, whether or not an on-line backup was in progress when the WAL entry
was made).  Also make provision for an XLOG_NOOP record type that can be
used to fill in the extra space when decompressing the data for restore.

This is the portion of Koichi Suzuki's "full page writes" patch that
has to go into the core database.  The remainder of that work is two
external compression and decompression programs, which for the time being
will undergo separate development on pgfoundry.  Per discussion.

Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be
possible to compress them (the previous coding caused essential info
to be omitted).  The other commonly-used record types seem OK already,
with the possible exception of GIN and GIST WAL records, which I don't
understand well enough to opine on.

a8d539f1

19 5月, 2007 1 次提交

Have CLUSTER advance the table's relfrozenxid. The new frozen point is the · b40776d2

由 Alvaro Herrera 提交于 5月 18, 2007

FreezeXid introduced in a recent commit, so there isn't any data loss in this
approach.

Doing it causes ALTER TABLE (or rather, the forms of it that cause a full table
rewrite) to be affected as well.  In this case, the frozen point is RecentXmin,
because after the rewrite all the tuples are relabeled with the rewriting
transaction's Xid.

TOAST tables are fixed automatically as well, as fallout of the way they were
already being handled in the respective code paths.

With this patch, there is no longer need to VACUUM tables for Xid wraparound
purposes that have been cleaned up via TRUNCATE or CLUSTER.

b40776d2

18 5月, 2007 2 次提交

Temporary fix for the problem that pg_stat_activity, inet_client_addr(), · dbb76935

由 Tom Lane 提交于 5月 17, 2007

and inet_server_addr() fail if the client connected over a "scoped" IPv6
address. In this case getnameinfo() will return a string ending with
a poorly-standardized "%something" zone specifier, which these functions
try to feed to network_in(), which won't take it. So that we don't lose
functionality altogether, suppress the zone specifier before giving the
string to network_in(). Per report from Brian Hirt.

TODO: probably someday the inet type should support scoped IPv6 addresses,
and then this patch should be reverted.

Backpatch to 8.2 ... is it worth going further?

dbb76935

Fix parameter recalculation for Limit nodes: during a ReScan call we must · b11123b6

由 Tom Lane 提交于 5月 17, 2007

recompute the limit/offset immediately, so that the updated values are
available when the child's ReScan function is invoked. Add a regression
test for this, too. Bug is new in HEAD (due to the bounded-sorting patch)
so no need for back-patch.

I did not do anything about merging this signaling with chgParam processing,
but if we were to do that we'd still need to compute the updated values
at this point rather than during the first ProcNode call.

Per observation and test case from Greg Stark, though I didn't use his patch.

b11123b6

17 5月, 2007 2 次提交

Move the tuple freezing point in CLUSTER to a point further back in the past, · 3b0347b3

由 Alvaro Herrera 提交于 5月 17, 2007

to avoid losing useful Xid information in not-so-old tuples.  This makes
CLUSTER behave the same as VACUUM as far a tuple-freezing behavior goes
(though CLUSTER does not yet advance the table's relfrozenxid).

While at it, move the actual freezing operation in rewriteheap.c to a more
appropriate place, and document it thoroughly.  This part of the patch from
Tom Lane.

3b0347b3

Have TRUNCATE advance the affected table's relfrozenxid to RecentXmin, to · 90cbc63f

由 Alvaro Herrera 提交于 5月 16, 2007

avoid a later needless VACUUM for Xid-wraparound purposes.  We can do this
since the table is known to be left empty, so no Xid remains on it.

Per discussion.

90cbc63f

16 5月, 2007 1 次提交

Update comments for PG_DETOAST_PACKED and VARDATA_ANY on a structures · 178214d2

由 Bruce Momjian 提交于 5月 15, 2007

that require alignment.

Add a paragraph to the "User-Defined Types" chapter on using these
macros since it seems like they're a hit.

Gregory Stark

178214d2

12 5月, 2007 3 次提交

Fix the problem that creating a user-defined type named _foo, followed by one · 9aa3c782

由 Tom Lane 提交于 5月 12, 2007

named foo, would work but the other ordering would not. If a user-specified
type or table name collides with an existing auto-generated array name, just
rename the array type out of the way by prepending more underscores. This
should not create any backward-compatibility issues, since the cases in which
this will happen would have failed outright in prior releases.

Also fix an oversight in the arrays-of-composites patch: ALTER TABLE RENAME
renamed the table's rowtype but not its array type.

9aa3c782

Fix my oversight in enabling domains-of-domains: ALTER DOMAIN ADD CONSTRAINT · d8326119

由 Tom Lane 提交于 5月 11, 2007

needs to check the new constraint against columns of derived domains too.

Also, make it error out if the domain to be modified is used within any
composite-type columns. Eventually we should support that case, but it seems
a bit painful, and not suitable for a back-patch. For the moment just let the
user know we can't do it.

Backpatch to 8.2, which is the only released version that allows nested
domains. Possibly the other part should be back-patched further.

d8326119

Support arrays of composite types, including the rowtypes of regular tables · bc8036fc

由 Tom Lane 提交于 5月 11, 2007

and views (but not system catalogs, nor sequences or toast tables). Get rid
of the hardwired convention that a type's array type is named exactly "_type",
instead using a new column pg_type.typarray to provide the linkage. (It still
will be named "_type", though, except in odd corner cases such as
maximum-length type names.)

Along the way, make tracking of owner and schema dependencies for types more
uniform: a type directly created by the user has these dependencies, while a
table rowtype or auto-generated array type does not have them, but depends on
its parent object instead.

David Fetter, Andrew Dunstan, Tom Lane

bc8036fc

09 5月, 2007 2 次提交

T
Reserve some pg_statistic "kind" codes for use by the ESRI ST_Geometry · 5b7cf08d
由 Tom Lane 提交于 5月 08, 2007
```
datatype project.  Per request from Ale Raza (araza at esri.com).
```
5b7cf08d

Add a hash function for "numeric". Mark the equality operator for · ade493e0

由 Neil Conway 提交于 5月 08, 2007

numerics as "oprcanhash", and make the corresponding system catalog
updates. As a result, hash indexes, hashed aggregation, and hash
joins can now be used with the numeric type. Bump the catversion.

The only tricky aspect to doing this is writing a correct hash
function: it's possible for two Numerics to be equal according to
their equality operator, but have different in-memory bit patterns.
To cope with this, the hash function doesn't consider the Numeric's
"scale" or "sign", and explictly skips any leading or trailing
zeros in the Numeric's digit buffer (the current implementation
should suppress any such zeros, but it seems unwise to rely upon
this). See discussion on pgsql-patches for more details.

ade493e0

05 5月, 2007 1 次提交
- T
  Add a line to the EXPLAIN ANALYZE output for a Sort node, showing the · d2a4a406
  由 Tom Lane 提交于 5月 04, 2007
```
actual sort strategy and amount of space used.  By popular demand.
```
  d2a4a406
04 5月, 2007 4 次提交

T

tas() support for Renesas' M32R processor. Kazuhiro Inaoka · c7464720
由 Tom Lane 提交于 5月 04, 2007

c7464720

A few fixups in error handling: mark pg_re_throw() as noreturn for gcc, · 79ca7ffe

由 Tom Lane 提交于 5月 04, 2007

and for other compilers, insert a dummy exit() call so that they understand
PG_RE_THROW() doesn't return. Insert fflush(stderr) in ExceptionalCondition,
per recent buildfarm evidence that that might not happen automatically on some
platforms. And const-ify ExceptionalCondition's declaration while at it.

79ca7ffe

Teach tuplesort.c about "top N" sorting, in which only the first N tuples · d26559db

由 Tom Lane 提交于 5月 04, 2007

need be returned. We keep a heap of the current best N tuples and sift-up
new tuples into it as we scan the input. For M input tuples this means
only about M*log(N) comparisons instead of M*log(M), not to mention a lot
less workspace when N is small --- avoiding spill-to-disk for large M
is actually the most attractive thing about it. Patch includes planner
and executor support for invoking this facility in ORDER BY ... LIMIT
queries. Greg Stark, with some editorialization by moi.

d26559db

Tweak hash index AM to use the new ReadOrZeroBuffer bufmgr API when fetching · 0fef38da

由 Tom Lane 提交于 5月 03, 2007

pages it intends to zero immediately. Just to show there is some use for that
function besides WAL recovery :-).
Along the way, fold _hash_checkpage and _hash_pageinit calls into _hash_getbuf
and friends, instead of expecting callers to do that separately.

0fef38da

03 5月, 2007 1 次提交

During WAL recovery, when reading a page that we intend to overwrite completely · 8c3cc86e

由 Tom Lane 提交于 5月 02, 2007

from the WAL data, don't bother to physically read it; just have bufmgr.c
return a zeroed-out buffer instead. This speeds recovery significantly,
and also avoids unnecessary failures when a page-to-be-overwritten has corrupt
page headers on disk. This replaces a former kluge that accomplished the
latter by pretending zero_damaged_pages was always ON during WAL recovery;
which was OK when the kluge was put in, but is unsafe when restoring a WAL
log that was written with full_page_writes off.

Heikki Linnakangas

8c3cc86e

02 5月, 2007 1 次提交

Fix oversight in PG_RE_THROW processing: it's entirely possible that there · 88f1fd29

由 Tom Lane 提交于 5月 02, 2007

isn't any place to throw the error to. If so, we should treat the error
as FATAL, just as we would have if it'd been thrown outside the PG_TRY
block to begin with.

Although this is clearly a *potential* source of bugs, it is not clear
at the moment whether it is an *actual* source of bugs; there may not
presently be any PG_TRY blocks in code that can be reached with no outer
longjmp catcher. So for the moment I'm going to be conservative and not
back-patch this. The change breaks ABI for users of PG_RE_THROW and hence
might create compatibility problems for loadable modules, so we should not
put it into released branches without proof that it's needed.

88f1fd29

01 5月, 2007 2 次提交

Change the timestamps recorded in transaction commit/abort xlog records · c4320619

由 Tom Lane 提交于 4月 30, 2007

from time_t to TimestampTz representation. This provides full gettimeofday()
resolution of the timestamps, which might be useful when attempting to
do point-in-time recovery --- previously it was not possible to specify
the stop point with sub-second resolution. But mostly this is to get
rid of TimestampTz-to-time_t conversion overhead during commit. Per my
proposal of a day or two back.

c4320619

T
Fix oversight in my patch of yesterday: forgot to ensure that stats would · 641912b4
由 Tom Lane 提交于 4月 30, 2007
```
still be forced out at backend exit.
```
641912b4

30 4月, 2007 1 次提交

Implement rate-limiting logic on how often backends will attempt to send · 957d08c8

由 Tom Lane 提交于 4月 30, 2007

messages to the stats collector. This avoids the problem that enabling
stats_row_level for autovacuum has a significant overhead for short
read-only transactions, as noted by Arjen van der Meijden. We can avoid
an extra gettimeofday call by piggybacking on the one done for WAL-logging
xact commit or abort (although that doesn't help read-only transactions,
since they don't WAL-log anything).

In my proposal for this, I noted that we could change the WAL log entries
for commit/abort to record full TimestampTz precision, instead of only
time_t as at present. That's not done in this patch, but will be committed
separately.

957d08c8

28 4月, 2007 1 次提交

Modify processing of DECLARE CURSOR and EXPLAIN so that they can resolve the · bbbe825f

由 Tom Lane 提交于 4月 27, 2007

types of unspecified parameters when submitted via extended query protocol.
This worked in 8.2 but I had broken it during plancache changes. DECLARE
CURSOR is now treated almost exactly like a plain SELECT through parse
analysis, rewrite, and planning; only just before sending to the executor
do we divert it away to ProcessUtility. This requires a special-case check
in a number of places, but practically all of them were already special-casing
SELECT INTO, so it's not too ugly. (Maybe it would be a good idea to merge
the two by treating IntoClause as a form of utility statement? Not going to
worry about that now, though.) That approach doesn't work for EXPLAIN,
however, so for that I punted and used a klugy solution of running parse
analysis an extra time if under extended query protocol.

bbbe825f

27 4月, 2007 2 次提交

Fix dynahash.c to suppress hash bucket splits while a hash_seq_search() scan · a2e923a6

由 Tom Lane 提交于 4月 26, 2007

is in progress on the same hashtable. This seems the least invasive way to
fix the recently-recognized problem that a split could cause the scan to
visit entries twice or (with much lower probability) miss them entirely.
The only field-reported problem caused by this is the "failed to re-find
shared lock object" PANIC in COMMIT PREPARED reported by Michel Dorochevsky,
which was caused by multiply visited entries. However, it seems certain
that mdsync() is vulnerable to missing required fsync's due to missed
entries, and I am fearful that RelationCacheInitializePhase2() might be at
risk as well. Because of that and the generalized hazard presented by this
bug, back-patch all the supported branches.

Along the way, fix pg_prepared_statement() and pg_cursor() to not assume
that the hashtables they are examining will stay static between calls.
This is risky regardless of the newly noted dynahash problem, because
hash_seq_search() has never promised to cope with deletion of table entries
other than the just-returned one. There may be no bug here because the only
supported way to call these functions is via ExecMakeTableFunctionResult()
which will cycle them to completion before doing anything very interesting,
but it seems best to get rid of the assumption. This affects 8.2 and HEAD
only, since those functions weren't there earlier.

a2e923a6

Rename the newly-added commands for discarding session state. · 16efdb5e

由 Neil Conway 提交于 4月 26, 2007

RESET SESSION, RESET PLANS, and RESET TEMP are now DISCARD ALL,
DISCARD PLANS, and DISCARD TEMP, respectively. This is to avoid
confusion with the pre-existing RESET variants: the DISCARD
commands are not actually similar to RESET. Patch from Marko
Kreen, with some minor editorialization.

16efdb5e