提交 · ed0b409d22346b1b027a4c2099ca66984d94b6dd · Greenplum / Gpdb

25 11月, 2011 1 次提交

Move "hot" members of PGPROC into a separate PGXACT array. · ed0b409d

由 Robert Haas 提交于 11月 25, 2011

This speeds up snapshot-taking and reduces ProcArrayLock contention.
Also, the PGPROC (and PGXACT) structures used by two-phase commit are
now allocated as part of the main array, rather than in a separate
array, and we keep ProcArray sorted in pointer order. These changes
are intended to minimize the number of cache lines that must be pulled
in to take a snapshot, and testing shows a substantial increase in
performance on both read and write workloads at high concurrencies.

Pavan Deolasee, Heikki Linnakangas, Robert Haas

ed0b409d

19 11月, 2011 1 次提交

Avoid floating-point underflow while tracking buffer allocation rate. · 40d35036

由 Tom Lane 提交于 11月 19, 2011

When the system is idle for awhile after activity, the "smoothed_alloc"
state variable in BgBufferSync converges slowly to zero. With standard
IEEE float arithmetic this results in several iterations with denormalized
values, which causes kernel traps and annoying log messages on some
poorly-designed platforms. There's no real need to track such small values
of smoothed_alloc, so we can prevent the kernel traps by forcing it to zero
as soon as it's too small to be interesting for our purposes. This issue
is purely cosmetic, since the iterations don't happen fast enough for the
kernel traps to pose any meaningful performance problem, but still it seems
worth shutting up the log messages.

The kernel log messages were previously reported by a number of people,
but kudos to Greg Matthews for tracking down exactly where they were coming
from.

40d35036

11 11月, 2011 1 次提交

Revert removal of trace_userlocks, because userlocks aren't gone. · 71b2b657

由 Robert Haas 提交于 11月 10, 2011

This reverts commit 0180bd61.
contrib/userlock is gone, but user-level locking still exists,
and is exposed via the pg_advisory* family of functions.

71b2b657

02 11月, 2011 4 次提交

Derive oldestActiveXid at correct time for Hot Standby. · 86e33648

由 Simon Riggs 提交于 11月 02, 2011

There was a timing window between when oldestActiveXid was derived
and when it should have been derived that only shows itself under
heavy load. Move code around to ensure correct timing of derivation.
No change to StartupSUBTRANS() code, which is where this failed.

Bug report by Chris Redekop

86e33648

Start Hot Standby faster when initial snapshot is incomplete. · 10b7c686

由 Simon Riggs 提交于 11月 02, 2011

If the initial snapshot had overflowed then we can start whenever
the latest snapshot is empty, not overflowed or as we did already,
start when the xmin on primary was higher than xmax of our starting
snapshot, which proves we have full snapshot data.

Bug report by Chris Redekop

10b7c686

Initialize myProcLocks queues just once, at postmaster startup. · c2891b46

由 Robert Haas 提交于 11月 01, 2011

In assert-enabled builds, we assert during the shutdown sequence that
the queues have been properly emptied, and during process startup that
we are inheriting empty queues.  In non-assert enabled builds, we just
save a few cycles.

c2891b46

Split work of bgwriter between 2 processes: bgwriter and checkpointer. · 806a2aee

由 Simon Riggs 提交于 11月 01, 2011

bgwriter is now a much less important process, responsible for page
cleaning duties only. checkpointer is now responsible for checkpoints
and so has a key role in shutdown. Later patches will correct doc
references to the now old idea that bgwriter performs checkpoints.
Has beneficial effect on performance at high write rates, but mainly
refactoring to more easily allow changes for power reduction by
simplifying previously tortuous code around required to allow page
cleaning and checkpointing to time slice in the same process.

Patch by me, Review by Dickson Guedes

806a2aee

29 10月, 2011 1 次提交

Allow hint bits to be set sooner for temporary and unlogged tables. · 53f1ca59

由 Robert Haas 提交于 10月 28, 2011

We need not wait until the commit record is durably on disk, because
in the event of a crash the page we're updating with hint bits will
be gone anyway. Per off-list report from Heikki Linnakangas, this
can significantly degrade the performance of unlogged tables; I was
able to show a 2x speedup from this patch on a pgbench run with scale
factor 15. In practice, this will mostly help small, heavily updated
tables, because on larger tables you're unlikely to run into the same
row again before the commit record makes it out to disk.

53f1ca59

28 10月, 2011 1 次提交

Fix the number of lwlocks needed by the "fast path" lock patch. It needs · cbf65509

由 Heikki Linnakangas 提交于 10月 27, 2011

one lock per backend or auxiliary process - the need for a lock for each
aux processes was not accounted for in NumLWLocks(). No-one noticed,
because the three locks needed for the three aux processes fit into the
few extra lwlocks we allocate for 3rd party modules that don't call
RequestAddinLWLocks() (NUM_USER_DEFINED_LWLOCKS, 4 by default).

cbf65509

23 10月, 2011 1 次提交

Support synchronization of snapshots through an export/import procedure. · bb446b68

由 Tom Lane 提交于 10月 22, 2011

A transaction can export a snapshot with pg_export_snapshot(), and then
others can import it with SET TRANSACTION SNAPSHOT.  The data does not
leave the server so there are not security issues.  A snapshot can only
be imported while the exporting transaction is still running, and there
are some other restrictions.

I'm not totally convinced that we've covered all the bases for SSI (true
serializable) mode, but it works fine for lesser isolation modes.

Joachim Wieland, reviewed by Marko Tiikkaja, and rather heavily modified
by Tom Lane

bb446b68

21 10月, 2011 1 次提交

Simplify and improve ProcessStandbyHSFeedbackMessage logic. · b4a0223d

由 Tom Lane 提交于 10月 20, 2011

There's no need to clamp the standby's xmin to be greater than
GetOldestXmin's result; if there were any such need this logic would be
hopelessly inadequate anyway, because it fails to account for
within-database versus cluster-wide values of GetOldestXmin. So get rid of
that, and just rely on sanity-checking that the xmin is not wrapped around
relative to the nextXid counter. Also, don't reset the walsender's xmin if
the current feedback xmin is indeed out of range; that just creates more
problems than we already had. Lastly, don't bother to take the
ProcArrayLock; there's no need to do that to set xmin.

Also improve the comments about this in GetOldestXmin itself.

b4a0223d

14 10月, 2011 1 次提交
- B
  Remove all "traces" of trace_userlocks, because userlocks were removed · 0180bd61
  由 Bruce Momjian 提交于 10月 13, 2011
```
in PG 8.2.
```
  0180bd61
11 10月, 2011 1 次提交
- R
  Repair breakage in VirtualXactLock. · e76bcaba
  由 Robert Haas 提交于 10月 11, 2011
```
I broke this in commit 84e37126.  Report and
fix by Fujii Masao.
```
  e76bcaba
27 9月, 2011 1 次提交

Allow snapshot references to still work during transaction abort. · 57eb0090

由 Tom Lane 提交于 9月 26, 2011

In REPEATABLE READ (nee SERIALIZABLE) mode, an attempt to do
GetTransactionSnapshot() between AbortTransaction and CleanupTransaction
failed, because GetTransactionSnapshot would recompute the transaction
snapshot (which is already wrong, given the isolation mode) and then
re-register it in the TopTransactionResourceOwner, leading to an Assert
because the TopTransactionResourceOwner should be empty of resources after
AbortTransaction. This is the root cause of bug #6218 from Yamamoto
Takashi. While changing plancache.c to avoid requesting a snapshot when
handling a ROLLBACK masks the problem, I think this is really a snapmgr.c
bug: it's lower-level than the resource manager mechanism and should not be
shutting itself down before we unwind resource manager resources. However,
just postponing the release of the transaction snapshot until cleanup time
didn't work because of the circular dependency with
TopTransactionResourceOwner. Fix by managing the internal reference to
that snapshot manually instead of depending on TopTransactionResourceOwner.
This saves a few cycles as well as making the module layering more
straightforward. predicate.c's dependencies on TopTransactionResourceOwner
go away too.

I think this is a longstanding bug, but there's no evidence that it's more
than a latent bug, so it doesn't seem worth any risk of back-patching.

57eb0090

24 9月, 2011 1 次提交

Memory barrier support for PostgreSQL. · 0c8eda62

由 Robert Haas 提交于 9月 23, 2011

This is not actually used anywhere yet, but it gets the basic
infrastructure in place.  It is fairly likely that there are bugs, and
support for some important platforms may be missing, so we'll need to
refine this as we go along.

0c8eda62

12 9月, 2011 1 次提交

Remove many -Wcast-qual warnings · 1b81c2fe

由 Peter Eisentraut 提交于 9月 11, 2011

This addresses only those cases that are easy to fix by adding or
moving a const qualifier or removing an unnecessary cast.  There are
many more complicated cases remaining.

1b81c2fe

10 9月, 2011 1 次提交

Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h. · a7801b62

由 Tom Lane 提交于 9月 09, 2011

As per my recent proposal, this refactors things so that these typedefs and
macros are available in a header that can be included in frontend-ish code.
I also changed various headers that were undesirably including
utils/timestamp.h to include datatype/timestamp.h instead.  Unsurprisingly,
this showed that half the system was getting utils/timestamp.h by way of
xlog.h.

No actual code changes here, just header refactoring.

a7801b62

04 9月, 2011 1 次提交

Clean up the #include mess a little. · 1609797c

由 Tom Lane 提交于 9月 04, 2011

walsender.h should depend on xlog.h, not vice versa. (Actually, the
inclusion was circular until a couple hours ago, which was even sillier;
but Bruce broke it in the expedient rather than logically correct
direction.) Because of that poor decision, plus blind application of
pgrminclude, we had a situation where half the system was depending on
xlog.h to include such unrelated stuff as array.h and guc.h. Clean up
the header inclusion, and manually revert a lot of what pgrminclude had
done so things build again.

This episode reinforces my feeling that pgrminclude should not be run
without adult supervision. Inclusion changes in header files in particular
need to be reviewed with great care. More generally, it'd be good if we
had a clearer notion of module layering to dictate which headers can sanely
include which others ... but that's a big task for another day.

1609797c

01 9月, 2011 1 次提交
- B
  
  Remove unnecessary #include references, per pgrminclude script. · 6416a82a
  由 Bruce Momjian 提交于 9月 01, 2011
  
  6416a82a
29 8月, 2011 1 次提交

Improve spinlock performance for HP-UX, ia64, non-gcc. · c01c25fb

由 Robert Haas 提交于 8月 29, 2011

At least on this architecture, it's very important to spin on a
non-atomic instruction and only retry the atomic once it appears
that it will succeed.  To fix this, split TAS() into two macros:
TAS(), for trying to grab the lock the first time, and TAS_SPIN(),
for spinning until we get it.  TAS_SPIN() defaults to same as TAS(),
but we can override it when we know there's a better way.

It's likely that some of the other cases in s_lock.h require
similar treatment, but this is the only one we've got conclusive
evidence for at present.

c01c25fb

27 8月, 2011 1 次提交
- B
  
  Add missing includes after pgrminclude run. · f261deb4
  由 Bruce Momjian 提交于 8月 26, 2011
  
  f261deb4
23 8月, 2011 1 次提交
- R
  
  Typo fix. · 74889364
  由 Robert Haas 提交于 8月 22, 2011
  
  74889364
18 8月, 2011 1 次提交

Remove obsolete README file. · 24bf1552

由 Robert Haas 提交于 8月 18, 2011

Perhaps we ought to add some other kind of documentation here instead,
but for now let's get rid of this woefully obsolete description of the
sinval machinery.

24bf1552

15 8月, 2011 1 次提交

Add "Reason code" prefix to internal SSI error messages · e5475a80

由 Peter Eisentraut 提交于 8月 15, 2011

This makes it clearer that the error message is perhaps not supposed
to be understood by users, and it also makes it somewhat clearer that
it was not accidentally omitted from translation.

Idea from Heikki Linnakangas, except that we don't mark "Reason code"
for translation at this point, because that would make the
implementation too cumbersome.

e5475a80

11 8月, 2011 1 次提交

Change the autovacuum launcher to use WaitLatch instead of a poll loop. · 4dab3d5a

由 Tom Lane 提交于 8月 10, 2011

In pursuit of this (and with the expectation that WaitLatch will be needed
in more places), convert the latch field that was already added to PGPROC
for sync rep into a generic latch that is activated for all PGPROC-owning
processes, and change many of the standard backend signal handlers to set
that latch when a signal happens. This will allow WaitLatch callers to be
wakened properly by these signals.

In passing, fix a whole bunch of signal handlers that had been hacked to do
things that might change errno, without adding the necessary save/restore
logic for errno. Also make some minor fixes in unix_latch.c, and clean
up bizarre and unsafe scheme for disowning the process's latch. Much of
this has to be back-patched into 9.1.

Peter Geoghegan, with additional work by Tom

4dab3d5a

10 8月, 2011 1 次提交

Documentation improvement and minor code cleanups for the latch facility. · 4e15a4db

由 Tom Lane 提交于 8月 09, 2011

Improve the documentation around weak-memory-ordering risks, and do a pass
of general editorialization on the comments in the latch code. Make the
Windows latch code more like the Unix latch code where feasible; in
particular provide the same Assert checks in both implementations.
Fix poorly-placed WaitLatch call in syncrep.c.

This patch resolves, for the moment, concerns around weak-memory-ordering
bugs in latch-related code: we have documented the restrictions and checked
that existing calls meet them. In 9.2 I hope that we will install suitable
memory barrier instructions in SetLatch/ResetLatch, so that their callers
don't need to be quite so careful.

4e15a4db

05 8月, 2011 1 次提交

Create VXID locks "lazily" in the main lock table. · 84e37126

由 Robert Haas 提交于 8月 04, 2011

Instead of entering them on transaction startup, we materialize them
only when someone wants to wait, which will occur only during CREATE
INDEX CONCURRENTLY. In Hot Standby mode, the startup process must also
be able to probe for conflicting VXID locks, but the lock need never be
fully materialized, because the startup process does not use the normal
lock wait mechanism. Since most VXID locks never need to touch the
lock manager partition locks, this can significantly reduce blocking
contention on read-heavy workloads.

Patch by me. Review by Jeff Davis.

84e37126

03 8月, 2011 2 次提交

Move CheckRecoveryConflictDeadlock() call to a safer place. · ac36e6f7

由 Tom Lane 提交于 8月 02, 2011

This kluge was inserted in a spot apparently chosen at random: the lock
manager's state is not yet fully set up for the wait, and in particular
LockWaitCancel hasn't been armed by setting lockAwaited, so the ProcLock
will not get cleaned up if the ereport is thrown.  This seems to not cause
any observable problem in trivial test cases, because LockReleaseAll will
silently clean up the debris; but I was able to cause failures with tests
involving subtransactions.

Fixes breakage induced by commit c85c9414.
Back-patch to all affected branches.

ac36e6f7

Fix incorrect initialization of ProcGlobal->startupBufferPinWaitBufId. · 2e53bd55

由 Tom Lane 提交于 8月 02, 2011

It was initialized in the wrong place and to the wrong value.  With bad
luck this could result in incorrect query-cancellation failures in hot
standby sessions, should a HS backend be holding pin on buffer number 1
while trying to acquire a lock.

2e53bd55

01 8月, 2011 1 次提交
- R
  
  Minor stylistic corrections. · 85b436f7
  由 Robert Haas 提交于 8月 01, 2011
  
  85b436f7
30 7月, 2011 1 次提交

Reduce sinval synchronization overhead. · b4fbe392

由 Robert Haas 提交于 7月 29, 2011

Testing shows that the overhead of acquiring and releasing
SInvalReadLock and msgNumLock on high-core count boxes can waste a lot
of CPU time and hurt performance.  This patch adds a per-backend flag
that allows us to skip all that locking in most cases.  Further
testing shows that this improves performance even when sinval traffic
is very high.

Patch by me.  Review and testing by Noah Misch.

b4fbe392

28 7月, 2011 1 次提交
- P
  
  Minor message style adjustment · 0fe81508
  由 Peter Eisentraut 提交于 7月 27, 2011
  
  0fe81508
20 7月, 2011 1 次提交

Some refinement for the "fast path" lock patch. · 8e5ac74c

由 Robert Haas 提交于 7月 19, 2011

1. In GetLockStatusData, avoid initializing instance before we've ensured
that the array is large enough.  Otherwise, if repalloc moves the block
around, we're hosed.

2. Add the word "Relation" to the name of some identifiers, to avoid
assuming that the fast-path mechanism will only ever apply to relations
(though these particular parts certainly will).  Some of the macros
could possibly use similar treatment, but the names are getting awfully
long already.

3. Add a missing word to comment in AtPrepare_Locks().

8e5ac74c

19 7月, 2011 1 次提交
- P
  
  Change debug message from ereport to elog · 30f85453
  由 Peter Eisentraut 提交于 7月 19, 2011
  
  30f85453
18 7月, 2011 3 次提交

Create a "fast path" for acquiring weak relation locks. · 3cba8999

由 Robert Haas 提交于 5月 28, 2011

When an AccessShareLock, RowShareLock, or RowExclusiveLock is requested
on an unshared database relation, and we can verify that no conflicting
locks can possibly be present, record the lock in a per-backend queue,
stored within the PGPROC, rather than in the primary lock table.  This
eliminates a great deal of contention on the lock manager LWLocks.

This patch also refactors the interface between GetLockStatusData() and
pg_lock_status() to be a bit more abstract, so that we don't rely so
heavily on the lock manager's internal representation details.  The new
fast path lock structures don't have a LOCK or PROCLOCK structure to
return, so we mustn't depend on that for purposes of listing outstanding
locks.

Review by Jeff Davis.

3cba8999

Further thoughts about temp_file_limit patch. · 9473bb96

由 Tom Lane 提交于 7月 17, 2011

Move FileClose's decrement of temporary_files_size up, so that it will be
executed even if elog() throws an error.  This is reasonable since if the
unlink() fails, the fact the file is still there is not our fault, and we
are going to forget about it anyhow.  So we won't count it against
temp_file_limit anymore.

Update fileSize and temporary_files_size correctly in FileTruncate.
We probably don't have any places that truncate temp files, but fd.c
surely should not assume that.

9473bb96

Add temp_file_limit GUC parameter to constrain temporary file space usage. · 23e5b16c

由 Tom Lane 提交于 7月 17, 2011

The limit is enforced against the total amount of temp file space used by
each session.

Mark Kirkwood, reviewed by Cédric Villemain and Tatsuo Ishii

23e5b16c

17 7月, 2011 2 次提交

Replace errdetail("%s", ...) with errdetail_internal("%s", ...). · 1af37ec9

由 Tom Lane 提交于 7月 16, 2011

There may be some other places where we should use errdetail_internal,
but they'll have to be evaluated case-by-case. This commit just hits
a bunch of places where invoking gettext is obviously a waste of cycles.

1af37ec9

T
Use errdetail_internal() for SSI transaction cancellation details. · 3ee7c871
由 Tom Lane 提交于 7月 16, 2011
```
Per discussion, these seem too technical to be worth translating.

Kevin Grittner
```
3ee7c871

09 7月, 2011 1 次提交

Try to acquire relation locks in RangeVarGetRelid. · 4240e429

由 Robert Haas 提交于 7月 08, 2011

In the previous coding, we would look up a relation in RangeVarGetRelid,
lock the resulting OID, and then AcceptInvalidationMessages(). While
this was sufficient to ensure that we noticed any changes to the
relation definition before building the relcache entry, it didn't
handle the possibility that the name we looked up no longer referenced
the same OID. This was particularly problematic in the case where a
table had been dropped and recreated: we'd latch on to the entry for
the old relation and fail later on. Now, we acquire the relation lock
inside RangeVarGetRelid, and retry the name lookup if we notice that
invalidation messages have been processed meanwhile. Many operations
that would previously have failed with an error in the presence of
concurrent DDL will now succeed.

There is a good deal of work remaining to be done here: many callers
of RangeVarGetRelid still pass NoLock for one reason or another. In
addition, nothing in this patch guards against the possibility that
the meaning of an unqualified name might change due to the creation
of a relation in a schema earlier in the user's search path than the
one where it was previously found. Furthermore, there's nothing at
all here to guard against similar race conditions for non-relations.
For all that, it's a start.

Noah Misch and Robert Haas

4240e429