- 21 12月, 2011 1 次提交
-
-
由 Tom Lane 提交于
smgrdounlink takes care to not throw an ERROR if it fails to unlink something, but that caution was rendered useless by commit 33960006, which put an smgrexists call in front of it; smgrexists *does* throw error if anything looks funny, such as getting a permissions error from trying to open the file. If that happens post-commit, you get a PANIC, and what's worse the same logic appears in the WAL replay code, so the database even fails to restart. Restore the intended behavior by removing the smgrexists call --- it isn't accomplishing anything that we can't do better by adjusting mdunlink's ideas of whether it ought to warn about ENOENT or not. Per report from Joseph Shraibman of unrecoverable crash after trying to drop a table whose FSM fork had somehow gotten chmod'd to 000 permissions. Backpatch to 8.4, where the bogus coding was introduced.
-
- 02 11月, 2011 1 次提交
-
-
由 Simon Riggs 提交于
bgwriter is now a much less important process, responsible for page cleaning duties only. checkpointer is now responsible for checkpoints and so has a key role in shutdown. Later patches will correct doc references to the now old idea that bgwriter performs checkpoints. Has beneficial effect on performance at high write rates, but mainly refactoring to more easily allow changes for power reduction by simplifying previously tortuous code around required to allow page cleaning and checkpointing to time slice in the same process. Patch by me, Review by Dickson Guedes
-
- 27 8月, 2011 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 11 6月, 2011 1 次提交
-
-
由 Alvaro Herrera 提交于
"Blind writes" are a mechanism to push buffers down to disk when evicting them; since they may belong to different databases than the one a backend is connected to, the backend does not necessarily have a relation to link them to, and thus no way to blow them away. We were keeping those files open indefinitely, which would cause a problem if the underlying table was deleted, because the operating system would not be able to reclaim the disk space used by those files. To fix, have bufmgr mark such files as transient to smgr; the lower layer is allowed to close the file descriptor when the current transaction ends. We must be careful to have any other access of the file to remove the transient markings, to prevent unnecessary expensive system calls when evicting buffers belonging to our own database (which files we're likely to require again soon.) This commit fixes a bug in the previous one, which neglected to cleanly handle the LRU ring that fd.c uses to manage open files, and caused an unacceptable failure just before beta2 and was thus reverted.
-
- 10 6月, 2011 2 次提交
-
-
由 Alvaro Herrera 提交于
This reverts commit 54d9e8c6, which caused a failure on the buildfarm. Not a good thing to have just before a beta release.
-
由 Alvaro Herrera 提交于
"Blind writes" are a mechanism to push buffers down to disk when evicting them; since they may belong to different databases than the one a backend is connected to, the backend does not necessarily have a relation to link them to, and thus no way to blow them away. We were keeping those files open indefinitely, which would cause a problem if the underlying table was deleted, because the operating system would not be able to reclaim the disk space used by those files. To fix, have bufmgr mark such files as transient to smgr; the lower layer is allowed to close the file descriptor when the current transaction ends. We must be careful to have any other access of the file to remove the transient markings, to prevent unnecessary expensive system calls when evicting buffers belonging to our own database (which files we're likely to require again soon.)
-
- 12 4月, 2011 1 次提交
-
-
由 Peter Eisentraut 提交于
This warning is new in gcc 4.6 and part of -Wall. This patch cleans up most of the noise, but there are some still warnings that are trickier to remove.
-
- 10 4月, 2011 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 29 1月, 2011 1 次提交
-
-
由 Robert Haas 提交于
When we need to insert a new entry and the queue is full, compact the entire queue in the hopes of making room for the new entry. Doing this on every insertion might worsen contention on BgWriterCommLock, but when the queue it's full, it's far better than allowing the backend to perform its own fsync, per testing by Greg Smith as reported in http://archives.postgresql.org/pgsql-hackers/2011-01/msg02665.php Original idea from Greg Smith. Patch by me. Review by Chris Browne and Greg Smith
-
- 02 1月, 2011 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 14 12月, 2010 1 次提交
-
-
由 Robert Haas 提交于
Greg Smith, reviewed by Jeff Janes
-
- 16 11月, 2010 1 次提交
-
-
由 Robert Haas 提交于
This new field counts the number of times that a backend which writes a buffer out to the OS must also fsync() it. This happens when the bgwriter fsync request queue is full, and is generally detrimental to performance, so it's good to know when it's happening. Along the way, log a new message at level DEBUG1 whenever we fail to hand off an fsync, so that the problem can also be seen in examination of log files (if the logging level is cranked up high enough). Greg Smith, with minor tweaks by me.
-
- 21 9月, 2010 1 次提交
-
-
由 Magnus Hagander 提交于
-
- 14 8月, 2010 1 次提交
-
-
由 Robert Haas 提交于
This allows us to reliably remove all leftover temporary relation files on cluster startup without reference to system catalogs or WAL; therefore, we no longer include temporary relations in XLOG_XACT_COMMIT and XLOG_XACT_ABORT WAL records. Since these changes require including a backend ID in each SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id field has been reduced from two bytes to one, and the maximum number of connections has been reduced from INT_MAX / 4 to 2^23-1. It would be possible to remove these restrictions by increasing the size of SharedInvalidationMessage by 4 bytes, but right now that doesn't seem like a good trade-off. Review by Jaime Casanova and Tom Lane.
-
- 26 2月, 2010 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 03 1月, 2010 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 06 8月, 2009 1 次提交
-
-
由 Heikki Linnakangas 提交于
fsync() fails, say "file" rather than "relation" when printing the filename. This makes messages that display block numbers a bit confusing. For example, in message 'could not read block 150000 of file "base/1234/5678.1"', 150000 is the block number from the beginning of the relation, ie. segment 0, not 150000th block within that segment. Per discussion, users aren't usually interested in the exact location within the file, so we can live with that. To ease constructing error messages, add FilePathName(File) function to return the pathname of a virtual fd.
-
- 27 6月, 2009 1 次提交
-
-
由 Tom Lane 提交于
archive recovery. Invent a separate state variable and inquiry function for XLogInsertAllowed() to clarify some tests and make the management of writing the end-of-recovery checkpoint less klugy. Fix several places that were incorrectly testing InRecovery when they should be looking at RecoveryInProgress or XLogInsertAllowed (because they will now be executed in the bgwriter not startup process). Clarify handling of bad LSNs passed to XLogFlush during recovery. Use a spinlock for setting/testing SharedRecoveryInProgress. Improve quite a lot of comments. Heikki and Tom
-
- 26 6月, 2009 1 次提交
-
-
由 Heikki Linnakangas 提交于
during it: When bgwriter is active, the startup process can't perform mdsync() correctly because it won't see the fsync requests accumulated in bgwriter's private pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery checkpoint as well, when it's active. When bgwriter is active (= archive recovery), the startup process must not accumulate fsync requests to its own pendingOpsTable, since bgwriter won't see them there when it performs restartpoints. Make startup process drop its pendingOpsTable when bgwriter is launched to avoid that. Update minimum recovery point one last time when leaving archive recovery. It won't be updated by the end-of-recovery checkpoint because XLogFlush() sees us as out of recovery already. This fixes bug #4879 reported by Fujii Masao.
-
- 11 6月, 2009 1 次提交
-
-
由 Bruce Momjian 提交于
provided by Andrew.
-
- 12 3月, 2009 1 次提交
-
-
由 Tom Lane 提交于
some bufmgr probes, take out redundant and memory-leak-inducing path arguments to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to recalculate space used in sort__done, clean up formatting in places where I'm not sure pgindent will do a nice job by itself.
-
- 12 1月, 2009 1 次提交
-
-
由 Tom Lane 提交于
GUC variable effective_io_concurrency controls how many concurrent block prefetch requests will be issued. (The best way to handle this for plain index scans is still under debate, so that part is not applied yet --- tgl) Greg Stark
-
- 02 1月, 2009 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 17 12月, 2008 1 次提交
-
-
由 Bruce Momjian 提交于
includes a few new ones. - Fixed compilation errors on OS X for probes that use typedefs - Fixed a number of probes to pass ForkNumber per the relation forks patch - The new probes are those that were taken out from the previous submitted patch and required simple fixes. Will submit the other probes that may require more discussion in a separate patch. Robert Lor
-
- 14 11月, 2008 1 次提交
-
-
由 Heikki Linnakangas 提交于
before passing it to elog.
-
- 11 11月, 2008 1 次提交
-
-
由 Heikki Linnakangas 提交于
"base/11517/3767_fsm", instead of symbolic names like "1663/11517/3767/1", per Alvaro's suggestion. I didn't change the messages in the higher-level index, heap and FSM routines, though, where the fork is implicit.
-
- 11 8月, 2008 1 次提交
-
-
由 Heikki Linnakangas 提交于
of multiple forks, and each fork can be created and grown separately. The bulk of this patch is about changing the smgr API to include an extra ForkNumber argument in every smgr function. Also, smgrscheduleunlink and smgrdounlink no longer implicitly call smgrclose, because other forks might still exist after unlinking one. The callers of those functions have been modified to call smgrclose instead. This patch in itself doesn't have any user-visible effect, but provides the infrastructure needed for upcoming patches. The additional forks envisioned are a rewritten FSM implementation that doesn't rely on a fixed-size shared memory block, and a visibility map to allow skipping portions of a table in VACUUM that have no dead tuples.
-
- 02 5月, 2008 1 次提交
-
-
由 Tom Lane 提交于
support for a nonsegmented mode from md.c. Per recent discussions, there doesn't seem to be much value in a "never segment" option as opposed to segmenting with a suitably large segment size. So instead provide a configure-time switch to set the desired segment size in units of gigabytes. While at it, expose a configure switch for BLCKSZ as well. Zdenek Kotala
-
- 18 4月, 2008 1 次提交
-
-
由 Heikki Linnakangas 提交于
place to prevent reusing relation OIDs before next checkpoint, and DROP DATABASE. First, if a database was dropped, bgwriter would still try to unlink the files that the rmtree() call by the DROP DATABASE command has already deleted, or is just about to delete. Second, if a database is dropped, and another database is created with the same OID, bgwriter would in the worst case delete a relation in the new database that happened to get the same OID as a dropped relation in the old database. To fix these race conditions: - make rmtree() ignore ENOENT errors. This fixes the 1st race condition. - make ForgetDatabaseFsyncRequests forget unlink requests as well. - force checkpoint on in dropdb on all platforms Since ForgetDatabaseFsyncRequests() is asynchronous, the 2nd change isn't enough on its own to fix the problem of dropping and creating a database with same OID, but forcing a checkpoint on DROP DATABASE makes it sufficient. Per Tom Lane's bug report and proposal. Backpatch to 8.3.
-
- 11 3月, 2008 1 次提交
-
-
由 Tom Lane 提交于
than dividing them into 1GB segments as has been our longtime practice. This requires working support for large files in the operating system; at least for the time being, it won't be the default. Zdenek Kotala
-
- 02 1月, 2008 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 16 11月, 2007 5 次提交
-
-
由 Tom Lane 提交于
-
由 Bruce Momjian 提交于
avoid this problem in the future.)
-
由 Tom Lane 提交于
support the latter.
-
由 Bruce Momjian 提交于
-
由 Tom Lane 提交于
checkpoint. This guards against an unlikely data-loss scenario in which we re-use the relfilenode, then crash, then replay the deletion and recreation of the file. Even then we'd be OK if all insertions into the new relation had been WAL-logged ... but that's not guaranteed given all the no-WAL-logging optimizations that have recently been added. Patch by Heikki Linnakangas, per a discussion last month.
-
- 03 7月, 2007 1 次提交
-
-
由 Tom Lane 提交于
checkpoint. The comment claimed that we could do this anytime after setting the checkpoint REDO point, but actually BufferSync is relying on the assumption that buffers dumped by other backends will be fsync'd too. So we really could not do it any sooner than we are doing it.
-
- 13 4月, 2007 1 次提交
-
-
由 Tom Lane 提交于
fast flow of new fsync requests can prevent mdsync() from ever completing. This was an unforeseen consequence of a patch added in Mar 2006 to prevent the fsync request queue from overflowing. Problem identified by Heikki Linnakangas and independently by ITAGAKI Takahiro; fix based on ideas from Takahiro-san, Heikki, and Tom. Back-patch as far as 8.1 because a previous back-patch introduced the problem into 8.1 ...
-
- 18 1月, 2007 1 次提交
-
-
由 Tom Lane 提交于
pending fsyncs during DROP DATABASE. Obviously necessary in hindsight :-(
-
- 17 1月, 2007 1 次提交
-
-
由 Tom Lane 提交于
is deleted. A backend about to unlink a file now sends a "revoke fsync" request to the bgwriter to make it clean out pending fsync requests. There is still a race condition where the bgwriter may try to fsync after the unlink has happened, but we can resolve that by rechecking the fsync request queue to see if a revoke request arrived meanwhile. This eliminates the former kluge of "just assuming" that an ENOENT failure is okay, and lets us handle the fact that on Windows it might be EACCES too without introducing any questionable assumptions. After an idea of mine improved by Magnus. The HEAD patch doesn't apply cleanly to 8.2, but I'll see about a back-port later. In the meantime this could do with some testing on Windows; I've been able to force it through the code path via ENOENT, but that doesn't prove that it actually fixes the Windows problem ...
-