提交 · d26888bc4d1e539a82f21382b0000fe5bbf889d9 · Greenplum / Gpdb

18 7月, 2013 5 次提交

Move checking an explicit VARIADIC "any" argument into the parser. · d26888bc

由 Andrew Dunstan 提交于 7月 18, 2013

This is more efficient and simpler . It does mean that an untyped NULL
can no longer be used in such cases, which should be mentioned in
Release Notes, but doesn't seem a terrible loss. The workaround is to
cast the NULL to some array type.

Pavel Stehule, reviewed by Jeevan Chalke.

d26888bc

Fix direct access to Relation->rd_indpred. · 405a468b

由 Tom Lane 提交于 7月 18, 2013

Should use RelationGetIndexPredicate(), since rd_indpred is just a cache
that is not computed until/unless demanded. Per buildfarm failure on
CLOBBER_CACHE_ALWAYS animals; diagnosis and fix by Hitoshi Harada.

405a468b

H
Fix variable names mentioned in comment to match the code. · 107cbc90
由 Heikki Linnakangas 提交于 7月 17, 2013
```
Also, in another comment, explain why holding an insertion slot is a
critical section.

Per review by Amit Kapila.
```
107cbc90

Fix assert failure at end of recovery, broken by XLogInsert scaling patch. · 59c02a36

由 Heikki Linnakangas 提交于 7月 17, 2013

Initialization of the first XLOG buffer at end-of-recovery was broken for
the case that the last read WAL record ended at a page boundary. Instead of
trying to copy the last full xlog page to the buffer cache in that case,
just set shared state so that the next page is initialized when the first
WAL record after startup is inserted. (that's what we did in earlier
version, too)

To make the shared state required for that case less surprising, replace the
XLogCtl->curridx variable, which was the index of the latest initialized
buffer, with an XLogRecPtr of how far the buffers have been initialized.
That also allows us to get rid of the XLogRecEndPtrToBufIdx macro.

While we're at it, make a similar change for XLogCtl->Write.curridx, getting
rid of that variable and calculating the next buffer to write from
XLogCtl->LogwrtResult instead.

59c02a36

Fix end-of-loop optimization in pglz_find_match() function. · 3f2adace

由 Heikki Linnakangas 提交于 7月 17, 2013

After the recent pglz optimization patch, the next/prev pointers in the
hash table are never NULL, INVALID_ENTRY_PTR is used to represent invalid
entries instead. The end-of-loop check in pglz_find_match() function didn't
get the memo. The result was the same from a correctness point of view, but
because the NULL-check would never fail, the tiny optimization turned into
a pessimization.

Reported by Stephen Frost, using Coverity scanner.

3f2adace

17 7月, 2013 5 次提交

Fix systable_recheck_tuple() for MVCC scan snapshots. · ffcf6545

由 Noah Misch 提交于 7月 16, 2013

Since this function assumed non-MVCC snapshots, it broke when commit
568d4138 switched its one caller from
SnapshotNow scans to MVCC-snapshot scans.

Reviewed by Robert Haas, Tom Lane and Andres Freund.

ffcf6545

Implement the FILTER clause for aggregate function calls. · b560ec1b

由 Noah Misch 提交于 7月 16, 2013

This is SQL-standard with a few extensions, namely support for
subqueries and outer references in clause expressions.

catversion bump due to change in Aggref and WindowFunc.

David Fetter, reviewed by Dean Rasheed.

b560ec1b

N

Comment on why planagg.c punts "MIN(x ORDER BY y)". · 7a8e9f29
由 Noah Misch 提交于 7月 16, 2013

7a8e9f29

Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. · cc1965a9

由 Kevin Grittner 提交于 7月 16, 2013

This allows reads to continue without any blocking while a REFRESH
runs.  The new data appears atomically as part of transaction
commit.

Review questioned the Assert that a matview was not a system
relation.  This will be addressed separately.

Reviewed by Hitoshi Harada, Robert Haas, Andres Freund.
Merged after review with security patch f3ab5d46.

cc1965a9

Allow background workers to be started dynamically. · 7f7485a0

由 Robert Haas 提交于 7月 16, 2013

There is a new API, RegisterDynamicBackgroundWorker, which allows
an ordinary user backend to register a new background writer during
normal running.  This means that it's no longer necessary for all
background workers to be registered during processing of
shared_preload_libraries, although the option of registering workers
at that time remains available.

When a background worker exits and will not be restarted, the
slot previously used by that background worker is automatically
released and becomes available for reuse.  Slots used by background
workers that are configured for automatic restart can't (yet) be
released without shutting down the system.

This commit adds a new source file, bgworker.c, and moves some
of the existing control logic for background workers there.
Previously, there was little enough logic that it made sense to
keep everything in postmaster.c, but not any more.

This commit also makes the worker_spi contrib module into an
extension and adds a new function, worker_spi_launch, which can
be used to demonstrate the new facility.

7f7485a0

16 7月, 2013 2 次提交

Check get_tle_by_resno() result before deref · 4ed22e89

由 Stephen Frost 提交于 7月 15, 2013

When creating a sort to support a group by, we need to look up the
target entry in the target list by the resno using get_tle_by_resno().
This particular code-path didn't check the result prior to attempting
to dereference it, while all other callers did.  While I can't see a
way for this usage of get_tle_by_resno() to fail (you can't ask for
a column to be sorted on which isn't included in the group by), it's
probably best to check that we didn't end up with a NULL somehow
anyway than risk the segfault.

I'm willing to back-patch this if others feel it's necessary, but my
guess is new features are what might tickle this rather than anything
existing.

Missing check spotted by the Coverity scanner.

4ed22e89

R
Assert that syscache lookups don't happen outside transactions. · 42c80c69
由 Robert Haas 提交于 7月 15, 2013
```
Andres Freund
```
42c80c69

15 7月, 2013 1 次提交

Ensure 64bit arithmetic when calculating tapeSpace · 273dcd16

由 Stephen Frost 提交于 7月 14, 2013

In tuplesort.c:inittapes(), we calculate tapeSpace by first figuring
out how many 'tapes' we can use (maxTapes) and then multiplying the
result by the tape buffer overhead for each.  Unfortunately, when
we are on a system with an 8-byte long, we allow work_mem to be
larger than 2GB and that allows maxTapes to be large enough that the
32bit arithmetic can overflow when multiplied against the buffer
overhead.

When this overflow happens, we end up adding the overflow to the
amount of space available, causing the amount of memory allocated to
be larger than work_mem.

Note that to reach this point, you have to set work mem to at least
24GB and be sorting a set which is at least that size.  Given that a
user who can set work_mem to 24GB could also set it even higher, if
they were looking to run the system out of memory, this isn't
considered a security issue.

This overflow risk was found by the Coverity scanner.

Back-patch to all supported branches, as this issue has existed
since before 8.4.

273dcd16

13 7月, 2013 3 次提交

Add session_preload_libraries configuration parameter · 070518dd

由 Peter Eisentraut 提交于 6月 12, 2013

This is like shared_preload_libraries except that it takes effect at
backend start and can be changed without a full postmaster restart.  It
is like local_preload_libraries except that it is still only settable by
a superuser.  This can be a better way to load modules such as
auto_explain.

Since there are now three preload parameters, regroup the documentation
a bit.  Put all parameters into one section, explain common
functionality only once, update the descriptions to reflect current and
future realities.
Reviewed-by: NDimitri Fontaine <dimitri@2ndQuadrant.fr>

070518dd

Switch user ID to the object owner when populating a materialized view. · f3ab5d46

由 Noah Misch 提交于 7月 12, 2013

This makes superuser-issued REFRESH MATERIALIZED VIEW safe regardless of
the object's provenance.  REINDEX is an earlier example of this pattern.
As a downside, functions called from materialized views must tolerate
running in a security-restricted operation.  CREATE MATERIALIZED VIEW
need not change user ID.  Nonetheless, avoid creation of materialized
views that will invariably fail REFRESH by making it, too, start a
security-restricted operation.

Back-patch to 9.3 so materialized views have this from the beginning.

Reviewed by Kevin Grittner.

f3ab5d46

N

Make comments reflect that omission of SPI_gettypmod() is intentional. · 448fee2e
由 Noah Misch 提交于 7月 12, 2013

448fee2e

10 7月, 2013 1 次提交
- P
  
  Fix lack of message pluralization · 8dead08c
  由 Peter Eisentraut 提交于 7月 09, 2013
  
  8dead08c
09 7月, 2013 1 次提交

Fix bool abuse · 7888c612

由 Peter Eisentraut 提交于 7月 08, 2013

path_encode's "closed" argument used to take three values: TRUE, FALSE,
or -1, while being of type bool.  Replace that with a three-valued enum
for more clarity.

7888c612

08 7月, 2013 3 次提交

Fix Windows build. · f489470f

由 Heikki Linnakangas 提交于 7月 08, 2013

Was broken by my xloginsert scaling patch. XLogCtl global variable needs
to be initialized in each process, as it's not inherited by fork() on
Windows.

f489470f

Improve scalability of WAL insertions. · 9a20a9b2

由 Heikki Linnakangas 提交于 7月 08, 2013

This patch replaces WALInsertLock with a number of WAL insertion slots,
allowing multiple backends to insert WAL records to the WAL buffers
concurrently. This is particularly useful for parallel loading large amounts
of data on a system with many CPUs.

This has one user-visible change: switching to a new WAL segment with
pg_switch_xlog() now fills the remaining unused portion of the segment with
zeros. This potentially adds some overhead, but it has been a very common
practice by DBA's to clear the "tail" of the segment with an external
pg_clearxlogtail utility anyway, to make the WAL files compress better.
With this patch, it's no longer necessary to do that.

This patch adds a new GUC, xloginsert_slots, to tune the number of WAL
insertion slots. Performance testing suggests that the default, 8, works
pretty well for all kinds of worklods, but I left the GUC in place to allow
others with different hardware to test that easily. We might want to remove
that before release.

Reviewed by Andres Freund.

9a20a9b2

Fix planning of parameterized appendrel paths with expensive join quals. · 5372275b

由 Tom Lane 提交于 7月 07, 2013

The code in set_append_rel_pathlist() for building parameterized paths
for append relations (inheritance and UNION ALL combinations) supposed
that the cheapest regular path for a child relation would still be cheapest
when reparameterized. Which might not be the case, particularly if the
added join conditions are expensive to compute, as in a recent example from
Jeff Janes. Fix it to compare child path costs *after* reparameterizing.
We can short-circuit that if the cheapest pre-existing path is already
parameterized correctly, which seems likely to be true often enough to be
worth checking for.

Back-patch to 9.2 where parameterized paths were introduced.

5372275b

07 7月, 2013 1 次提交

Handle posix_fallocate() errors. · 5b571bb8

由 Jeff Davis 提交于 7月 06, 2013

On some platforms, posix_fallocate() is available but may still return
EINVAL if the underlying filesystem does not support it.  So, in case
of an error, fall through to the alternate implementation that just
writes zeros.

Per buildfarm failure and analysis by Tom Lane.

5b571bb8

06 7月, 2013 2 次提交

N
Update messages, comments and documentation for materialized views. · 02d2b694
由 Noah Misch 提交于 7月 05, 2013
```
All instances of the verbiage lagging the code.  Back-patch to 9.3,
where materialized views were introduced.
```
02d2b694

Use posix_fallocate() for new WAL files, where available. · 269e7808

由 Jeff Davis 提交于 7月 05, 2013

This function is more efficient than actually writing out zeroes to
the new file, per microbenchmarks by Jon Nelson. Also, it may reduce
the likelihood of WAL file fragmentation.

Jon Nelson, with review by Andres Freund, Greg Smith and me.

269e7808

05 7月, 2013 3 次提交

Expose the estimation of number of changed tuples since last analyze · c87ff71f

由 Magnus Hagander 提交于 7月 05, 2013

This value, now pg_stat_all_tables.n_mod_since_analyze, was already
tracked and used by autovacuum, but not exposed to the user.

Mark Kirkwood, review by Laurenz Albe

c87ff71f

Use type "int64" for memory accounting in tuplesort.c/tuplestore.c. · 79e0f87a

由 Noah Misch 提交于 7月 04, 2013

Commit 263865a4 switched tuplesort.c and
tuplestore.c variables representing memory usage from type "long" to
type "Size". This was unnecessary; I thought doing so avoided overflow
scenarios on 64-bit Windows, but guc.c already limited work_mem so as to
prevent the overflow. It was also incomplete, not touching the logic
that assumed a signed data type. Change the affected variables to
"int64". This is perfect for 64-bit platforms, and it reduces the need
to contemplate platform-specific overflow scenarios. It also puts us
close to being able to support work_mem over 2 GiB on 64-bit Windows.

Per report from Andres Freund.

79e0f87a

F
Fix typo in comment. · 7842d41d
由 Fujii Masao 提交于 7月 05, 2013
```
Michael Paquier
```
7842d41d

04 7月, 2013 3 次提交

Add new GUC, max_worker_processes, limiting number of bgworkers. · 6bc8ef0b

由 Robert Haas 提交于 7月 04, 2013

In 9.3, there's no particular limit on the number of bgworkers;
instead, we just count up the number that are actually registered,
and use that to set MaxBackends.  However, that approach causes
problems for Hot Standby, which needs both MaxBackends and the
size of the lock table to be the same on the standby as on the
master, yet it may not be desirable to run the same bgworkers in
both places.  9.3 handles that by failing to notice the problem,
which will probably work fine in nearly all cases anyway, but is
not theoretically sound.

A further problem with simply counting the number of registered
workers is that new workers can't be registered without a
postmaster restart.  This is inconvenient for administrators,
since bouncing the postmaster causes an interruption of service.
Moreover, there are a number of applications for background
processes where, by necessity, the background process must be
started on the fly (e.g. parallel query).  While this patch
doesn't actually make it possible to register new background
workers after startup time, it's a necessary prerequisite.

Patch by me.  Review by Michael Paquier.

6bc8ef0b

Get rid of pg_class.reltoastidxid. · 2ef085d0

由 Fujii Masao 提交于 7月 04, 2013

Treat TOAST index just the same as normal one and get the OID
of TOAST index from pg_index but not pg_class.reltoastidxid.
This change allows us to handle multiple TOAST indexes, and
which is required infrastructure for upcoming
REINDEX CONCURRENTLY feature.

Patch by Michael Paquier, reviewed by Andres Freund and me.

2ef085d0

Fix handling of auto-updatable views on inherited tables. · 5530a826

由 Tom Lane 提交于 7月 03, 2013

An INSERT into such a view should work just like an INSERT into its base
table, ie the insertion should go directly into that table ... not be
duplicated into each child table, as was happening before, per bug #8275
from Rushabh Lathia.  On the other hand, the current behavior for
UPDATE/DELETE seems reasonable: the update/delete traverses the child
tables, or not, depending on whether the view specifies ONLY or not.
Add some regression tests covering this area.

Dean Rasheed

5530a826

03 7月, 2013 2 次提交

Unbreak postmaster restart-after-crash sequence · 620935ad

由 Alvaro Herrera 提交于 7月 03, 2013

In patch 82233ce7, AbortStartTime wasn't being reset appropriately
after the restart sequence, causing subsequent iterations through
ServerLoop to malfunction.

620935ad

Add support for multiple kinds of external toast datums. · 36820250

由 Robert Haas 提交于 7月 02, 2013

To that end, support tags rather than lengths for external datums.
As an example of how this can be used, add support or "indirect"
tuples which point to some externally allocated memory containing
a toast tuple.  Similar infrastructure could be used for other
purposes, including, perhaps, support for alternative compression
algorithms.

Andres Freund, reviewed by Hitoshi Harada and myself

36820250

02 7月, 2013 3 次提交

Use an MVCC snapshot, rather than SnapshotNow, for catalog scans. · 568d4138

由 Robert Haas 提交于 7月 02, 2013

SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.

The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.

Patch by me. Review by Michael Paquier and Andres Freund.

568d4138

R
Add a convenience routine makeFuncCall to reduce duplication. · 0d22987a
由 Robert Haas 提交于 7月 01, 2013
```
David Fetter and Andrew Gierth, reviewed by Jeevan Chalke
```
0d22987a

Add timezone offset output option to to_char() · 7408c5d2

由 Bruce Momjian 提交于 7月 01, 2013

Add ability for to_char() to output the timezone's UTC offset (OF).  We
already have the ability to return the timezone abbeviation (TZ/tz).
Per request from Andrew Dunstan

7408c5d2

01 7月, 2013 2 次提交

Optimize pglz compressor for small inputs. · 031cc55b

由 Heikki Linnakangas 提交于 7月 01, 2013

The pglz compressor has a significant startup cost, because it has to
initialize to zeros the history-tracking hash table. On a 64-bit system, the
hash table was 64kB in size. While clearing memory is pretty fast, for very
short inputs the relative cost of that was quite large.

This patch alleviates that in two ways. First, instead of storing pointers
in the hash table, store 16-bit indexes into the hist_entries array. That
slashes the size of the hash table to 1/2 or 1/4 of the original, depending
on the pointer width. Secondly, adjust the size of the hash table based on
input size. For very small inputs, you don't need a large hash table to
avoid collisions.

Review by Amit Kapila.

031cc55b

Retry short writes when flushing WAL. · 79ce29c7

由 Heikki Linnakangas 提交于 7月 01, 2013

We don't normally bother retrying when the number of bytes written by
write() is short of what was requested. It is generally assumed that a
write() to disk doesn't return short, unless you run out of disk space.
While writing the WAL, however, it seems prudent to try a bit harder,
because a failure leads to PANIC. The write() is also much larger than most
write()s in the backend (up to wal_buffers), so there's more room for
surprises.

Also retry on EINTR. All signals used in the backend are flagged SA_RESTART
nowadays, so it shouldn't happen, but better to be defensive.

79ce29c7

29 6月, 2013 3 次提交

Inline ginCompareItemPointers function for speed. · ee655655

由 Heikki Linnakangas 提交于 6月 29, 2013

ginCompareItemPointers function is called heavily in gin index scans -
inlining it speeds up some kind of queries a lot.

ee655655

S
Change errcode for lock_timeout to match NOWAIT · d51b2710
由 Simon Riggs 提交于 6月 29, 2013
```
Set errcode to ERRCODE_LOCK_NOT_AVAILABLE

Zoltán Bsöszörményi
```
d51b2710

ALTER TABLE ... ALTER CONSTRAINT for FKs · f177cbfe

由 Simon Riggs 提交于 6月 29, 2013

Allow constraint attributes to be altered,
so the default setting of NOT DEFERRABLE
can be altered to DEFERRABLE and back.

Review by Abhijit Menon-Sen

f177cbfe