提交 · 0c90442355fbbe785740669f63141f24674c8958 · Greenplum / Gpdb

13 12月, 2010 2 次提交

Reset all database-level stats in pgstat_recv_resetcounter(). · 0c904423

由 Tom Lane 提交于 12月 12, 2010

We were failing to zero out some pg_stat_database counters that have
been added since the initial pgstats coding.  This is a bug, but not
back-patching the fix since changing this behavior in a minor release
seems a cure worse than the disease.

Report and patch by Tomas Vondra.

0c904423

T
Make S_IRGRP etc available in mingw builds as well as MSVC. · 5132ad8b
由 Tom Lane 提交于 12月 12, 2010
```
(Hm, I wonder whether BCC defines them either...)

Also label dangling endifs a bit better in this area.
```
5132ad8b

12 12月, 2010 1 次提交

Provide a complete set of file-permission-bit macros in win32.h. · 1319002e

由 Tom Lane 提交于 12月 11, 2010

My previous patch exposed the fact that we didn't have these.  Those
hard-wired octal constants were actually wrong on Windows, not just
inconsistent.

1319002e

11 12月, 2010 5 次提交

R
Allow bidirectional copy messages in streaming replication mode. · d3d41469
由 Robert Haas 提交于 12月 11, 2010
```
Fujii Masao.  Review by Alvaro Herrera, Tom Lane, and myself.
```
d3d41469
M

Add required new port files to MSVC builds. · 20f39642
由 Magnus Hagander 提交于 12月 11, 2010

20f39642

Move a couple of initdb's subroutines into src/port/. · 67119992

由 Tom Lane 提交于 12月 10, 2010

mkdir_p and check_data_dir will be useful in CREATE TABLESPACE, since we
have agreed that that command should handle subdirectory creation just like
initdb creates the PGDATA directory. Push them into src/port/ so that they
are available to both initdb and the backend. Rename to pg_mkdir_p and
pg_check_dir, just to be on the safe side. Add FreeBSD's copyright notice
to pgmkdirp.c, since that's where the code came from originally (this
really should have been in initdb.c). Very marginal code/comment cleanup.

67119992

Use symbolic names not octal constants for file permission flags. · 04f4e10c

由 Tom Lane 提交于 12月 10, 2010

Purely cosmetic patch to make our coding standards more consistent ---
we were doing symbolic some places and octal other places. This patch
fixes all C-coded uses of mkdir, chmod, and umask. There might be some
other calls I missed. Inconsistency noted while researching tablespace
directory permissions issue.

04f4e10c

Fix efficiency problems in tuplestore_trim(). · 244407a7

由 Tom Lane 提交于 12月 10, 2010

The original coding in tuplestore_trim() was only meant to work efficiently
in cases where each trim call deleted most of the tuples in the store.
Which, in fact, was the pattern of the original usage with a Material node
supporting mark/restore operations underneath a MergeJoin. However,
WindowAgg now uses tuplestores and it has considerably less friendly
trimming behavior. In particular it can attempt to trim one tuple at a
time off a large tuplestore. tuplestore_trim() had O(N^2) runtime in this
situation because of repeatedly shifting its tuple pointer array. Fix by
avoiding shifting the array until a reasonably large number of tuples have
been deleted. This can waste some pointer space, but we do still reclaim
the tuples themselves, so the percentage wastage should be pretty small.

Per Jie Li's report of slow percent_rank() evaluation. cume_dist() and
ntile() would certainly be affected as well, along with any other window
function that has a moving frame start and requires reading substantially
ahead of the current row.

Back-patch to 8.4, where window functions were introduced. There's no
need to tweak it before that.

244407a7

10 12月, 2010 1 次提交

Eliminate O(N^2) behavior in parallel restore with many blobs. · 663fc32e

由 Tom Lane 提交于 12月 09, 2010

With hundreds of thousands of TOC entries, the repeated searches in
reduce_dependencies() become the dominant cost. Get rid of that searching
by constructing reverse-dependency lists, which we can do in O(N) time
during the fix_dependencies() preprocessing. I chose to store the reverse
dependencies as DumpId arrays for consistency with the forward-dependency
representation, and keep the previously-transient tocsByDumpId[] array
around to locate actual TOC entry structs quickly from dump IDs.

While this fixes the slow case reported by Vlad Arkhipov, there is still
a potential for O(N^2) behavior with sufficiently many tables:
fix_dependencies itself, as well as mark_create_done and
inhibit_data_for_failed_table, are doing repeated searches to deal with
table-to-table-data dependencies. Possibly this work could be extended
to deal with that, although the latter two functions are also used in
non-parallel restore where we currently don't run fix_dependencies.

Another TODO is that we fail to parallelize restore of multiple blobs
at all. This appears to require changes in the archive format to fix.

Back-patch to 9.0 where the problem was reported. 8.4 has potential issues
as well; but since it doesn't create a separate TOC entry for each blob,
it's at much less risk of having enough TOC entries to cause real problems.

663fc32e

09 12月, 2010 4 次提交

S

Self review of previous patch. Fix assumption that xmax >= xmin. · 9975c683
由 Simon Riggs 提交于 12月 09, 2010

9975c683

Reduce spurious Hot Standby conflicts from never-visible records. · b9075a6d

由 Simon Riggs 提交于 12月 09, 2010

Hot Standby conflicts only with tuples that were visible at
some point. So ignore tuples from aborted transactions or for
tuples updated/deleted during the inserting transaction when
generating the conflict transaction ids.

Following detailed analysis and test case by Noah Misch.
Original report covered btree delete records, correctly observed
by Heikki Linnakangas that this applies to other cases also.
Fix covers all sources of cleanup records via common code.

b9075a6d

Force default wal_sync_method to be fdatasync on Linux. · 576477e7

由 Tom Lane 提交于 12月 08, 2010

Recent versions of the Linux system header files cause xlogdefs.h to
believe that open_datasync should be the default sync method, whereas
formerly fdatasync was the default on Linux.  open_datasync is a bad
choice, first because it doesn't actually outperform fdatasync (in fact
the reverse), and second because we try to use O_DIRECT with it, causing
failures on certain filesystems (e.g., ext4 with data=journal option).
This part of the patch is largely per a proposal from Marti Raudsepp.
More extensive changes are likely to follow in HEAD, but this is as much
change as we want to back-patch.

Also clean up confusing code and incorrect documentation surrounding the
fsync_writethrough option.  Those changes shouldn't result in any actual
behavioral change, but I chose to back-patch them anyway to keep the
branches looking similar in this area.

In 9.0 and HEAD, also do some copy-editing on the WAL Reliability
documentation section.

Back-patch to all supported branches, since any of them might get used
on modern Linux versions.

576477e7

Optimize commit_siblings in two ways to improve group commit. · e620ee35

由 Simon Riggs 提交于 12月 08, 2010

First, avoid scanning the whole ProcArray once we know there
are at least commit_siblings active; second, skip the check
altogether if commit_siblings = 0.

Greg Smith

e620ee35

07 12月, 2010 3 次提交

Fix bugs in the hot standby known-assigned-xids tracking logic. If there's · 5a031a55

由 Heikki Linnakangas 提交于 12月 07, 2010

an old transaction running in the master, and a lot of transactions have
started and finished since, and a WAL-record is written in the gap between
the creating the running-xacts snapshot and WAL-logging it, recovery will fail
with "too many KnownAssignedXids" error. This bug was reported by
Joachim Wieland on Nov 19th.

In the same scenario, when fewer transactions have started so that all the
xids fit in KnownAssignedXids despite the first bug, a more serious bug
arises. We incorrectly initialize the clog code with the oldest still running
transaction, and when we see the WAL record belonging to a transaction with
an XID larger than one that committed already before the checkpoint we're
recovering from, we zero the clog page containing the already committed
transaction, leading to data loss.

In hindsight, trying to track xids in the known-assigned-xids array before
seeing the running-xacts record was too complicated. To fix that, hold
XidGenLock while the running-xacts snapshot is taken and WAL-logged. That
ensures that no transaction can begin or end in that gap, so that in recvoery
we know that the snapshot contains all transactions running at that point in
WAL.

5a031a55

Add a stack overflow check to copyObject(). · 8b569280

由 Tom Lane 提交于 12月 06, 2010

There are some code paths, such as SPI_execute(), where we invoke
copyObject() on raw parse trees before doing parse analysis on them. Since
the bison grammar is capable of building heavily nested parsetrees while
itself using only minimal stack depth, this means that copyObject() can be
the front-line function that hits stack overflow before anything else does.
Accordingly, it had better have a check_stack_depth() call. I did a bit of
performance testing and found that this slows down copyObject() by only a
few percent, so the hit ought to be negligible in the context of complete
processing of a query.

Per off-list report from Toshihide Katayama. Back-patch to all supported
branches.

8b569280

Allow the low level COPY routines to read arbitrary numbers of fields. · af1a614e

由 Andrew Dunstan 提交于 12月 06, 2010

This doesn't involve any user-visible change in behavior, but will be
useful when the COPY routines are exposed to allow their use by Foreign
Data Wrapper routines, which will be able to use these routines to read
irregular CSV files, for example.

af1a614e

06 12月, 2010 3 次提交

H

Fix two typos, by Fujii Masao. · 95e42a2c
由 Heikki Linnakangas 提交于 12月 06, 2010

95e42a2c
P

Put only single space after "Sort Method:", for consistency · 951d7861
由 Peter Eisentraut 提交于 12月 06, 2010

951d7861

Reduce memory consumption inside inheritance_planner(). · d1001a78

由 Tom Lane 提交于 12月 05, 2010

Avoid eating quite so much memory for large inheritance trees, by
reclaiming the space used by temporary copies of the original parsetree and
range table, as well as the workspace needed during planning. The cost is
needing to copy the finished plan trees out of the child memory context.
Although this looks like it ought to slow things down, my testing shows
it actually is faster, apparently because fewer interactions with malloc()
are needed and/or we can do the work within a more readily cacheable amount
of memory. That result might be platform-dependent, but I'll take it.

Per a gripe from John Papandriopoulos, in which it was pointed out that the
memory consumption actually grew as O(N^2) for sufficiently many child
tables, since we were creating N copies of the N-element range table.

d1001a78

05 12月, 2010 1 次提交

Fix two small bugs in new gistget.c logic. · d1f5a92e

由 Tom Lane 提交于 12月 04, 2010

1. Complain, rather than silently doing nothing, if an "invalid" tuple
is found on a leaf page. Per off-list discussion with Heikki.

2. Fix oversight in code that removes a GISTSearchItem from the search
queue: we have to reset lastHeap if this was the last heap item in the
parent GISTSearchTreeItem. Otherwise subsequent additions will do the
wrong thing. This was probably masked in early testing because in typical
cases the parent item would now be completely empty and would be deleted on
next call. You'd need a queued non-leaf page at exactly the same distance
as a heap tuple to expose the bug.

d1f5a92e

04 12月, 2010 5 次提交

P
Make output width consistent for all ways of invoking a regression test · 387e468b
由 Peter Eisentraut 提交于 12月 04, 2010
```
run_schedule() and run_single_test() were using different output widths, which
would show up in bigcheck/bigtest, for example.
```
387e468b
T

Update comment to match later code changes. · e194a942
由 Tom Lane 提交于 12月 04, 2010

e194a942
T

Add external documentation for KNNGIST. · b576757d
由 Tom Lane 提交于 12月 03, 2010

b576757d

Put back gistgettuple's check for backwards scan request. · 04910a3a

由 Tom Lane 提交于 12月 03, 2010

On reflection it's a bad idea for the KNNGIST patch to have removed that.
We don't want it silently returning incorrect answers.

04910a3a

KNNGIST, otherwise known as order-by-operator support for GIST. · 55450687

由 Tom Lane 提交于 12月 03, 2010

This commit represents a rather heavily editorialized version of
Teodor's builtin_knngist_itself-0.8.2 and builtin_knngist_proc-0.8.1
patches.  I redid the opclass API to add a separate Distance method
instead of turning the Consistent method into an illogical mess,
fixed some bit-rot in the rbtree interfaces, and generally worked over
the code style and comments.

There's still no non-code documentation to speak of, but I'll work on
that separately.  Some contrib-module changes are also yet to come
(right now, point <-> point is the only KNN-ified operator).

Teodor Sigaev and Tom Lane

55450687

03 12月, 2010 10 次提交

R
Remove now-outdated mention of quotes being required in recovery.conf. · 5ef6c913
由 Robert Haas 提交于 12月 03, 2010
```
Noted by Itagaki Takahiro.
```
5ef6c913

Use GUC lexer for recovery.conf parsing. · 970a1868

由 Robert Haas 提交于 12月 03, 2010

This eliminates some crufty, special-purpose code and, as a non-trivial
side benefit, allows recovery.conf parameters to be unquoted.

Dimitri Fontaine, with review and cleanup by Alvaro Herrera, Itagaki
Takahiro, and me.

970a1868

H
Remove misleading comments. Move _Clone and _DeClone functions before · 9cea52a5
由 Heikki Linnakangas 提交于 12月 03, 2010
```
the "END OF FORMAT CALLBACKS" comment, because they are format callbacks too.
```
9cea52a5
I
Remove unnecessary string null-termination in pg_convert. · fd223c74
由 Itagaki Takahiro 提交于 12月 03, 2010
```
We can directly verify the unterminated input with pg_verify_mbstr_len.
```
fd223c74

Create core infrastructure for KNNGIST. · d583f10b

由 Tom Lane 提交于 12月 02, 2010

This is a heavily revised version of builtin_knngist_core-0.9. The
ordering operators are no longer mixed in with actual quals, which would
have confused not only humans but significant parts of the planner.
Instead, ordering operators are carried separately throughout planning and
execution.

Since the API for ambeginscan and amrescan functions had to be changed
anyway, this commit takes the opportunity to rationalize that a bit.
RelationGetIndexScan no longer forces a premature index_rescan call;
instead, callers of index_beginscan must call index_rescan too. Aside from
making the AM-side initialization logic a bit less peculiar, this has the
advantage that we do not make a useless extra am_rescan call when there are
runtime key values. AMs formerly could not assume that the key values
passed to amrescan were actually valid; now they can.

Teodor Sigaev and Tom Lane

d583f10b

A
Move private struct declaration to compress_io.c · d7e5d151
由 Alvaro Herrera 提交于 12月 02, 2010
```
Keep only the typedef in the header file.
```
d7e5d151
A

Remove trailing whitespace · 0025b76f
由 Alvaro Herrera 提交于 12月 02, 2010

0025b76f
A

Remove useless struct declaration · d67a39c3
由 Alvaro Herrera 提交于 12月 02, 2010

d67a39c3
A

Silence compiler · 7f4a7af2
由 Alvaro Herrera 提交于 12月 02, 2010

7f4a7af2

Refactor the pg_dump zlib code from pg_backup_custom.c to a separate file, · bf9aa490

由 Heikki Linnakangas 提交于 12月 02, 2010

to make it easier to reuse that code. There is no user-visible changes.

This is in preparation for the patch to add a new archive format, a directory,
to perform a custom-like dump but with each table being dumped to a separate
file (that in turn is a prerequisite for parallel pg_dump). This also makes it
easier to add new compression methods in the future, and makes the
pg_backup_custom.c code easier to read, when the compression-related code is
factored out.

Joachim Wieland, with heavy editorialization by me.

bf9aa490

01 12月, 2010 1 次提交

Prevent inlining a SQL function with multiple OUT parameters. · 225f0aa3

由 Tom Lane 提交于 12月 01, 2010

There were corner cases in which the planner would attempt to inline such
a function, which would result in a failure at runtime due to loss of
information about exactly what the result record type is. Fix by disabling
inlining when the function's recorded result type is RECORD. There might
be some sub-cases where inlining could still be allowed, but this is a
simple and backpatchable fix, so leave refinements for another day.
Per bug #5777 from Nate Carson.

Back-patch to all supported branches. 8.1 happens to avoid a core-dump
here, but it still does the wrong thing.

225f0aa3

30 11月, 2010 1 次提交

Simplify and speed up mapping of index opfamilies to pathkeys. · c0b5fac7

由 Tom Lane 提交于 11月 29, 2010

Formerly we looked up the operators associated with each index (caching
them in relcache) and then the planner looked up the btree opfamily
containing such operators in order to build the btree-centric pathkey
representation that describes the index's sort order. This is quite
pointless for btree indexes: we might as well just use the index's opfamily
information directly. That saves syscache lookup cycles during planning,
and furthermore allows us to eliminate the relcache's caching of operators
altogether, which may help in reducing backend startup time.

I added code to plancat.c to perform the same type of double lookup
on-the-fly if it's ever faced with a non-btree amcanorder index AM.
If such a thing actually becomes interesting for production, we should
replace that logic with some more-direct method for identifying the
corresponding btree opfamily; but it's not worth spending effort on now.

There is considerably more to do pursuant to my recent proposal to get rid
of sort-operator-based representations of sort orderings, but this patch
grabs some of the low-hanging fruit. I'll look at the remainder of that
work after the current commitfest.

c0b5fac7

29 11月, 2010 1 次提交

Move call to GetTopTransactionId() earlier in LockAcquire(), · ed78384a

由 Simon Riggs 提交于 11月 29, 2010

removing an infrequently occurring race condition in Hot Standby.
An xid must be assigned before a lock appears in shared memory,
rather than immediately after, else GetRunningTransactionLocks()
may see InvalidTransactionId, causing assertion failures during
lock processing on standby.

Bug report and diagnosis by Fujii Masao, fix by me.

ed78384a

28 11月, 2010 1 次提交
- B
  In libpq/Makefile, use OBJS += as a way to break up long link lines into · 1f48290a
  由 Bruce Momjian 提交于 11月 27, 2010
```
something that can be documented.
```
  1f48290a
27 11月, 2010 1 次提交

On further testing, PQping also needs an explicit check for AUTH_REQ. · 49cd8a3f

由 Tom Lane 提交于 11月 27, 2010

The pg_fe_sendauth code might fail if it can't handle the authentication
request message type --- if so, ping should still say the server is up.

49cd8a3f