提交 · d0024cd1881447fa7aed58db94df379e593c6630 · Greenplum / Gpdb

21 12月, 2011 1 次提交

Avoid crashing when we have problems unlinking files post-commit. · d0024cd1

由 Tom Lane 提交于 12月 20, 2011

smgrdounlink takes care to not throw an ERROR if it fails to unlink
something, but that caution was rendered useless by commit
33960006, which put an smgrexists call in
front of it; smgrexists *does* throw error if anything looks funny, such
as getting a permissions error from trying to open the file. If that
happens post-commit, you get a PANIC, and what's worse the same logic
appears in the WAL replay code, so the database even fails to restart.

Restore the intended behavior by removing the smgrexists call --- it isn't
accomplishing anything that we can't do better by adjusting mdunlink's
ideas of whether it ought to warn about ENOENT or not.

Per report from Joseph Shraibman of unrecoverable crash after trying to
drop a table whose FSM fork had somehow gotten chmod'd to 000 permissions.
Backpatch to 8.4, where the bogus coding was introduced.

d0024cd1

02 11月, 2011 1 次提交

Split work of bgwriter between 2 processes: bgwriter and checkpointer. · 806a2aee

由 Simon Riggs 提交于 11月 01, 2011

bgwriter is now a much less important process, responsible for page
cleaning duties only. checkpointer is now responsible for checkpoints
and so has a key role in shutdown. Later patches will correct doc
references to the now old idea that bgwriter performs checkpoints.
Has beneficial effect on performance at high write rates, but mainly
refactoring to more easily allow changes for power reduction by
simplifying previously tortuous code around required to allow page
cleaning and checkpointing to time slice in the same process.

Patch by me, Review by Dickson Guedes

806a2aee

27 8月, 2011 1 次提交
- B
  
  Add missing includes after pgrminclude run. · f261deb4
  由 Bruce Momjian 提交于 8月 26, 2011
  
  f261deb4
11 6月, 2011 1 次提交

Use "transient" files for blind writes, take 2 · fba105b1

由 Alvaro Herrera 提交于 6月 10, 2011

"Blind writes" are a mechanism to push buffers down to disk when
evicting them; since they may belong to different databases than the one
a backend is connected to, the backend does not necessarily have a
relation to link them to, and thus no way to blow them away. We were
keeping those files open indefinitely, which would cause a problem if
the underlying table was deleted, because the operating system would not
be able to reclaim the disk space used by those files.

To fix, have bufmgr mark such files as transient to smgr; the lower
layer is allowed to close the file descriptor when the current
transaction ends. We must be careful to have any other access of the
file to remove the transient markings, to prevent unnecessary expensive
system calls when evicting buffers belonging to our own database (which
files we're likely to require again soon.)

This commit fixes a bug in the previous one, which neglected to cleanly
handle the LRU ring that fd.c uses to manage open files, and caused an
unacceptable failure just before beta2 and was thus reverted.

fba105b1

10 6月, 2011 2 次提交

Revert "Use "transient" files for blind writes" · 9261557e

由 Alvaro Herrera 提交于 6月 09, 2011

This reverts commit 54d9e8c6, which
caused a failure on the buildfarm.  Not a good thing to have just before
a beta release.

9261557e

Use "transient" files for blind writes · 54d9e8c6

由 Alvaro Herrera 提交于 6月 09, 2011

54d9e8c6

12 4月, 2011 1 次提交

Clean up most -Wunused-but-set-variable warnings from gcc 4.6 · 5caa3479

由 Peter Eisentraut 提交于 4月 11, 2011

This warning is new in gcc 4.6 and part of -Wall.  This patch cleans
up most of the noise, but there are some still warnings that are
trickier to remove.

5caa3479

10 4月, 2011 1 次提交
- B
  
  pgindent run before PG 9.1 beta 1. · bf50caf1
  由 Bruce Momjian 提交于 4月 10, 2011
  
  bf50caf1
29 1月, 2011 1 次提交

Try to avoid running with a full fsync request queue. · 7f242d88

由 Robert Haas 提交于 1月 29, 2011

When we need to insert a new entry and the queue is full, compact the
entire queue in the hopes of making room for the new entry.  Doing this
on every insertion might worsen contention on BgWriterCommLock, but
when the queue it's full, it's far better than allowing the backend to
perform its own fsync, per testing by Greg Smith as reported in
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02665.php

Original idea from Greg Smith.  Patch by me.  Review by Chris Browne
and Greg Smith

7f242d88

02 1月, 2011 1 次提交
- B
  
  Stamp copyrights for year 2011. · 5d950e3b
  由 Bruce Momjian 提交于 1月 01, 2011
  
  5d950e3b
14 12月, 2010 1 次提交
- R
  Instrument checkpoint sync calls. · 34c70c7a
  由 Robert Haas 提交于 12月 14, 2010
```
Greg Smith, reviewed by Jeff Janes
```
  34c70c7a
16 11月, 2010 1 次提交

Add new buffers_backend_fsync field to pg_stat_bgwriter. · 3134d886

由 Robert Haas 提交于 11月 15, 2010

This new field counts the number of times that a backend which writes a
buffer out to the OS must also fsync() it.  This happens when the
bgwriter fsync request queue is full, and is generally detrimental to
performance, so it's good to know when it's happening.  Along the way,
log a new message at level DEBUG1 whenever we fail to hand off an fsync,
so that the problem can also be seen in examination of log files
(if the logging level is cranked up high enough).

Greg Smith, with minor tweaks by me.

3134d886

21 9月, 2010 1 次提交
- M
  
  Remove cvs keywords from all files. · 9f2e2113
  由 Magnus Hagander 提交于 9月 20, 2010
  
  9f2e2113
14 8月, 2010 1 次提交

Include the backend ID in the relpath of temporary relations. · debcec7d

由 Robert Haas 提交于 8月 13, 2010

This allows us to reliably remove all leftover temporary relation
files on cluster startup without reference to system catalogs or WAL;
therefore, we no longer include temporary relations in XLOG_XACT_COMMIT
and XLOG_XACT_ABORT WAL records.

Since these changes require including a backend ID in each
SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id
field has been reduced from two bytes to one, and the maximum number
of connections has been reduced from INT_MAX / 4 to 2^23-1.  It would
be possible to remove these restrictions by increasing the size of
SharedInvalidationMessage by 4 bytes, but right now that doesn't seem
like a good trade-off.

Review by Jaime Casanova and Tom Lane.

debcec7d

26 2月, 2010 1 次提交
- B
  
  pgindent run for 9.0 · 65e806cb
  由 Bruce Momjian 提交于 2月 26, 2010
  
  65e806cb
03 1月, 2010 1 次提交
- B
  
  Update copyright for the year 2010. · 02398008
  由 Bruce Momjian 提交于 1月 02, 2010
  
  02398008
06 8月, 2009 1 次提交

Improve error messages in md.c. When a filesystem operation like open() or · 23dc89d2

由 Heikki Linnakangas 提交于 8月 05, 2009

fsync() fails, say "file" rather than "relation" when printing the filename.

This makes messages that display block numbers a bit confusing. For example,
in message 'could not read block 150000 of file "base/1234/5678.1"', 150000
is the block number from the beginning of the relation, ie. segment 0, not
150000th block within that segment. Per discussion, users aren't usually
interested in the exact location within the file, so we can live with that.

To ease constructing error messages, add FilePathName(File) function to
return the pathname of a virtual fd.

23dc89d2

27 6月, 2009 1 次提交

Cleanup and code review for the patch that made bgwriter active during · 2de48a83

由 Tom Lane 提交于 6月 26, 2009

archive recovery. Invent a separate state variable and inquiry function
for XLogInsertAllowed() to clarify some tests and make the management of
writing the end-of-recovery checkpoint less klugy. Fix several places
that were incorrectly testing InRecovery when they should be looking at
RecoveryInProgress or XLogInsertAllowed (because they will now be executed
in the bgwriter not startup process). Clarify handling of bad LSNs passed
to XLogFlush during recovery. Use a spinlock for setting/testing
SharedRecoveryInProgress. Improve quite a lot of comments.

Heikki and Tom

2de48a83

26 6月, 2009 1 次提交

Fix some serious bugs in archive recovery, now that bgwriter is active · 7e48b77b

由 Heikki Linnakangas 提交于 6月 25, 2009

during it:

When bgwriter is active, the startup process can't perform mdsync() correctly
because it won't see the fsync requests accumulated in bgwriter's private
pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery
checkpoint as well, when it's active.

When bgwriter is active (= archive recovery), the startup process must not
accumulate fsync requests to its own pendingOpsTable, since bgwriter won't
see them there when it performs restartpoints. Make startup process drop its
pendingOpsTable when bgwriter is launched to avoid that.

Update minimum recovery point one last time when leaving archive recovery.
It won't be updated by the end-of-recovery checkpoint because XLogFlush()
sees us as out of recovery already.

This fixes bug #4879 reported by Fujii Masao.

7e48b77b

11 6月, 2009 1 次提交
- B
  8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list · d7471402
  由 Bruce Momjian 提交于 6月 11, 2009
```
provided by Andrew.
```
  d7471402
12 3月, 2009 1 次提交

Code review for dtrace probes added (so far) to 8.4. Adjust placement of · e04810e8

由 Tom Lane 提交于 3月 11, 2009

some bufmgr probes, take out redundant and memory-leak-inducing path arguments
to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to
recalculate space used in sort__done, clean up formatting in places where
I'm not sure pgindent will do a nice job by itself.

e04810e8

12 1月, 2009 1 次提交

Implement prefetching via posix_fadvise() for bitmap index scans. A new · b7b8f0b6

由 Tom Lane 提交于 1月 12, 2009

GUC variable effective_io_concurrency controls how many concurrent block
prefetch requests will be issued.

(The best way to handle this for plain index scans is still under debate,
so that part is not applied yet --- tgl)

Greg Stark

b7b8f0b6

02 1月, 2009 1 次提交
- B
  
  Update copyright for 2009. · 511db38a
  由 Bruce Momjian 提交于 1月 01, 2009
  
  511db38a
17 12月, 2008 1 次提交

The attached patch contains a couple of fixes in the existing probes and · 5a90bc1f

由 Bruce Momjian 提交于 12月 17, 2008

includes a few new ones.

- Fixed compilation errors on OS X for probes that use typedefs
- Fixed a number of probes to pass ForkNumber per the relation forks
patch
- The new probes are those that were taken out from the previous
submitted patch and required simple fixes. Will submit the other probes
that may require more discussion in a separate patch.

Robert Lor

5a90bc1f

14 11月, 2008 1 次提交
- H
  Fix oversight in previous error-reporting patch; mustn't pfree path string · f06b7604
  由 Heikki Linnakangas 提交于 11月 14, 2008
```
before passing it to elog.
```
  f06b7604
11 11月, 2008 1 次提交

Change error messages to print the physical path, like · 7e8b0b9a

由 Heikki Linnakangas 提交于 11月 11, 2008

"base/11517/3767_fsm", instead of symbolic names like "1663/11517/3767/1",
per Alvaro's suggestion. I didn't change the messages in the higher-level
index, heap and FSM routines, though, where the fork is implicit.

7e8b0b9a

11 8月, 2008 1 次提交

Introduce the concept of relation forks. An smgr relation can now consist · 3f0e808c

由 Heikki Linnakangas 提交于 8月 11, 2008

of multiple forks, and each fork can be created and grown separately.

The bulk of this patch is about changing the smgr API to include an extra
ForkNumber argument in every smgr function. Also, smgrscheduleunlink and
smgrdounlink no longer implicitly call smgrclose, because other forks might
still exist after unlinking one. The callers of those functions have been
modified to call smgrclose instead.

This patch in itself doesn't have any user-visible effect, but provides the
infrastructure needed for upcoming patches. The additional forks envisioned
are a rewritten FSM implementation that doesn't rely on a fixed-size shared
memory block, and a visibility map to allow skipping portions of a table in
VACUUM that have no dead tuples.

3f0e808c

02 5月, 2008 1 次提交

Remove the recently added USE_SEGMENTED_FILES option, and indeed remove all · 3c6248a8

由 Tom Lane 提交于 5月 02, 2008

support for a nonsegmented mode from md.c.  Per recent discussions, there
doesn't seem to be much value in a "never segment" option as opposed to
segmenting with a suitably large segment size.  So instead provide a
configure-time switch to set the desired segment size in units of gigabytes.
While at it, expose a configure switch for BLCKSZ as well.

Zdenek Kotala

3c6248a8

18 4月, 2008 1 次提交

Fix two race conditions between the pending unlink mechanism that was put in · 9cb91f90

由 Heikki Linnakangas 提交于 4月 18, 2008

place to prevent reusing relation OIDs before next checkpoint, and DROP
DATABASE. First, if a database was dropped, bgwriter would still try to unlink
the files that the rmtree() call by the DROP DATABASE command has already
deleted, or is just about to delete. Second, if a database is dropped, and
another database is created with the same OID, bgwriter would in the worst
case delete a relation in the new database that happened to get the same OID
as a dropped relation in the old database.

To fix these race conditions:
- make rmtree() ignore ENOENT errors. This fixes the 1st race condition.
- make ForgetDatabaseFsyncRequests forget unlink requests as well.
- force checkpoint on in dropdb on all platforms

Since ForgetDatabaseFsyncRequests() is asynchronous, the 2nd change isn't
enough on its own to fix the problem of dropping and creating a database with
same OID, but forcing a checkpoint on DROP DATABASE makes it sufficient.

Per Tom Lane's bug report and proposal. Backpatch to 8.3.

9cb91f90

11 3月, 2008 1 次提交

Provide a build-time option to store large relations as single files, rather · f0828b2f

由 Tom Lane 提交于 3月 10, 2008

than dividing them into 1GB segments as has been our longtime practice.  This
requires working support for large files in the operating system; at least for
the time being, it won't be the default.

Zdenek Kotala

f0828b2f

02 1月, 2008 1 次提交
- B
  
  Update copyrights in source tree to 2008. · 9098ab9e
  由 Bruce Momjian 提交于 1月 01, 2008
  
  9098ab9e
16 11月, 2007 5 次提交
- T
  
  Fix stupid typo in recently-added code :-( · eae7e00f
  由 Tom Lane 提交于 11月 16, 2007
  
  eae7e00f
- B
  Re-run pgindent with updated list of typedefs. (Updated README should · f6e8730d
  由 Bruce Momjian 提交于 11月 15, 2007
```
avoid this problem in the future.)
```
  f6e8730d
- T
  Use ftruncate() not truncate() in mdunlink. Seems Windows doesn't · 591b9b09
  由 Tom Lane 提交于 11月 15, 2007
```
support the latter.
```
  591b9b09
- B
  
  pgindent run for 8.3. · fdf5a5ef
  由 Bruce Momjian 提交于 11月 15, 2007
  
  fdf5a5ef
- T
  Prevent re-use of a deleted relation's relfilenode until after the next · 6cc4451b
  由 Tom Lane 提交于 11月 15, 2007
```
checkpoint.  This guards against an unlikely data-loss scenario in which
we re-use the relfilenode, then crash, then replay the deletion and
recreation of the file.  Even then we'd be OK if all insertions into the
new relation had been WAL-logged ... but that's not guaranteed given all
the no-WAL-logging optimizations that have recently been added.

Patch by Heikki Linnakangas, per a discussion last month.
```
  6cc4451b
03 7月, 2007 1 次提交

Fix incorrect comment about the timing of AbsorbFsyncRequests() during · 83aaebba

由 Tom Lane 提交于 7月 03, 2007

checkpoint. The comment claimed that we could do this anytime after
setting the checkpoint REDO point, but actually BufferSync is relying
on the assumption that buffers dumped by other backends will be fsync'd
too. So we really could not do it any sooner than we are doing it.

83aaebba

13 4月, 2007 1 次提交

Rearrange mdsync() looping logic to avoid the problem that a sufficiently · 995ba280

由 Tom Lane 提交于 4月 12, 2007

fast flow of new fsync requests can prevent mdsync() from ever completing.
This was an unforeseen consequence of a patch added in Mar 2006 to prevent
the fsync request queue from overflowing.  Problem identified by Heikki
Linnakangas and independently by ITAGAKI Takahiro; fix based on ideas from
Takahiro-san, Heikki, and Tom.

Back-patch as far as 8.1 because a previous back-patch introduced the problem
into 8.1 ...

995ba280

18 1月, 2007 1 次提交
- T
  Extend yesterday's patch so that the bgwriter is also told to forget · eddbf397
  由 Tom Lane 提交于 1月 17, 2007
```
pending fsyncs during DROP DATABASE.  Obviously necessary in hindsight :-(
```
  eddbf397
17 1月, 2007 1 次提交

Revise bgwriter fsync-request mechanism to improve robustness when a table · 6d660587

由 Tom Lane 提交于 1月 17, 2007

is deleted. A backend about to unlink a file now sends a "revoke fsync"
request to the bgwriter to make it clean out pending fsync requests. There
is still a race condition where the bgwriter may try to fsync after the unlink
has happened, but we can resolve that by rechecking the fsync request queue
to see if a revoke request arrived meanwhile. This eliminates the former
kluge of "just assuming" that an ENOENT failure is okay, and lets us handle
the fact that on Windows it might be EACCES too without introducing any
questionable assumptions. After an idea of mine improved by Magnus.

The HEAD patch doesn't apply cleanly to 8.2, but I'll see about a back-port
later. In the meantime this could do with some testing on Windows; I've been
able to force it through the code path via ENOENT, but that doesn't prove that
it actually fixes the Windows problem ...

6d660587