- 27 6月, 2011 1 次提交
-
-
由 Robert Haas 提交于
It's been like this since HOT was originally introduced, but the logic is complex enough that this is a recipe for bugs, as we've already found out with SSI. So refactor heap_hot_search_buffer() so that it can satisfy the needs of index_getnext(), and make index_getnext() use that rather than duplicating the logic. This change was originally proposed by Heikki Linnakangas as part of a larger refactoring oriented towards allowing index-only scans. I extracted and adjusted this part, since it seems to have independent merit. Review by Jeff Davis.
-
- 22 6月, 2011 1 次提交
-
-
由 Robert Haas 提交于
This involves two main changes from the previous behavior. First, when we set a bit in the visibility map, emit a new WAL record of type XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and the visibility map bit. Second, when inserting, updating, or deleting a tuple, we can no longer get away with clearing the visibility map bit after releasing the lock on the corresponding heap page, because an intervening crash might leave the visibility map bit set and the page-level bit clear. Making this work requires a bit of interface refactoring. In passing, a few minor but related cleanups: change the test in visibilitymap_set and visibilitymap_clear to throw an error if the wrong page (or no page) is pinned, rather than silently doing nothing; this case should never occur. Also, remove duplicate definitions of InvalidXLogRecPtr. Patch by me, review by Noah Misch.
-
- 15 6月, 2011 1 次提交
-
-
由 Heikki Linnakangas 提交于
snapshots, like in REINDEX, are basically non-transactional operations. The DDL operation itself might participate in SSI, but there's separate functions for that. Kevin Grittner and Dan Ports, with some changes by me.
-
- 31 5月, 2011 1 次提交
-
-
由 Heikki Linnakangas 提交于
On further analysis, it turns out that it is not needed to duplicate predicate locks to the new row version at update, the lock on the version that the transaction saw as visible is enough. However, there was a different bug in the code that checks for dangerous structures when a new rw-conflict happens. Fix that bug, and remove all the row-version chaining related code. Kevin Grittner & Dan Ports, with some comment editorialization by me.
-
- 25 4月, 2011 1 次提交
-
-
由 Robert Haas 提交于
Bug #5899, reported by Marko Tiikkaja. Heikki Linnakangas, reviewed by Kevin Grittner and Dan Ports.
-
- 10 4月, 2011 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 04 3月, 2011 1 次提交
-
-
由 Heikki Linnakangas 提交于
CheckForSerializableConflictOut(), because it can set hint bits. YAMAMOTO Takashi
-
- 08 2月, 2011 1 次提交
-
-
由 Heikki Linnakangas 提交于
Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
-
- 02 1月, 2011 2 次提交
-
-
由 Robert Haas 提交于
Foreign tables are a core component of SQL/MED. This commit does not provide a working SQL/MED infrastructure, because foreign tables cannot yet be queried. Support for foreign table scans will need to be added in a future patch. However, this patch creates the necessary system catalog structure, syntax support, and support for ancillary operations such as COMMENT and SECURITY LABEL. Shigeru Hanada, heavily revised by Robert Haas
-
由 Bruce Momjian 提交于
-
- 14 12月, 2010 1 次提交
-
-
由 Robert Haas 提交于
This commit replaces pg_class.relistemp with pg_class.relpersistence; and also modifies the RangeVar node type to carry relpersistence rather than istemp. It also removes removes rd_istemp from RelationData and instead performs the correct computation based on relpersistence. For clarity, we add three new macros: RelationNeedsWAL(), RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we can clarify the purpose of each check that previous depended on rd_istemp. This is intended as infrastructure for the upcoming unlogged tables patch, as well as for future possible work on global temporary tables.
-
- 09 12月, 2010 2 次提交
-
-
由 Simon Riggs 提交于
-
由 Simon Riggs 提交于
Hot Standby conflicts only with tuples that were visible at some point. So ignore tuples from aborted transactions or for tuples updated/deleted during the inserting transaction when generating the conflict transaction ids. Following detailed analysis and test case by Noah Misch. Original report covered btree delete records, correctly observed by Heikki Linnakangas that this applies to other cases also. Fix covers all sources of cleanup records via common code.
-
- 21 9月, 2010 1 次提交
-
-
由 Magnus Hagander 提交于
-
- 12 9月, 2010 1 次提交
-
-
由 Joe Conway 提交于
transaction snapshots, i.e. a snapshot registered at the beginning of a transaction. Change variable naming and comments to reflect this reality in preparation for a future, truly serializable mode, e.g. Serializable Snapshot Isolation (SSI). For the moment transaction snapshots are still used to implement SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin Grittner and Dan Ports with review and some minor wording changes by me.
-
- 30 7月, 2010 1 次提交
-
-
由 Robert Haas 提交于
If a zeroed page is present in the heap, ALTER TABLE .. SET TABLESPACE will set the LSN and TLI while copying it, which is wrong, and heap_xlog_newpage() will do the same thing during replay, so the corruption propagates to any standby. Note, however, that the bug can't be demonstrated unless archiving is enabled, since in that case we skip WAL logging altogether, and the LSN/TLI are not set. Back-patch to 8.0; prior releases do not have tablespaces. Analysis and patch by Jeff Davis. Adjustments for back-branches and minor wordsmithing by me.
-
- 07 7月, 2010 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 03 5月, 2010 2 次提交
-
-
由 Tom Lane 提交于
-
由 Tom Lane 提交于
field of the WAL record. The previous coding always wrote to the main fork, resulting in data corruption if the page was meant to go into a non-default fork. At present, the only operation that can produce such WAL records is ALTER TABLE/INDEX SET TABLESPACE when executed with archive_mode = on. Data corruption would be observed on standby slaves, and could occur on the master as well if a database crash and recovery occurred after committing the ALTER and before the next checkpoint. Per report from Gordon Shannon. Back-patch to 8.4; the problem doesn't exist in earlier branches because we didn't have a concept of multiple relation forks then.
-
- 22 4月, 2010 1 次提交
-
-
由 Simon Riggs 提交于
come from the realistion that HEAP2_CLEAN records don't always remove user visible data, so conflict processing for them can be skipped. Confirm validity using Assert checks, clarify circumstances under which we log heap_cleanup_info records. Tuning arises from bug fixing of earlier safety check failures.
-
- 26 2月, 2010 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 15 2月, 2010 1 次提交
-
-
由 Robert Haas 提交于
The purpose of this change is to eliminate the need for every caller of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists, GetSysCacheOid, and SearchSysCacheList to know the maximum number of allowable keys for a syscache entry (currently 4). This will make it far easier to increase the maximum number of keys in a future release should we choose to do so, and it makes the code shorter, too. Design and review by Tom Lane.
-
- 08 2月, 2010 1 次提交
-
-
由 Tom Lane 提交于
VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity. Per discussion, the use case for this method of vacuuming is no longer large enough to justify maintaining it; not to mention that we don't wish to invest the work that would be needed to make it play nicely with Hot Standby. Aside from the code directly related to old-style VACUUM FULL, this commit removes support for certain WAL record types that could only be generated within VACUUM FULL, redirect-pointer removal in heap_page_prune, and nontransactional generation of cache invalidation sinval messages (the last being the sticking point for Hot Standby). We still have to retain all code that copes with finding HEAP_MOVED_OFF and HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long as we want to support in-place update from pre-9.0 databases.
-
- 03 2月, 2010 1 次提交
-
-
由 Heikki Linnakangas 提交于
heap_sync() to the callers, because heap_sync() is sometimes called even if the operation itself is WAL-logged. This eliminates the bogus unlogged records from CLUSTER that Simon Riggs reported, patch by Fujii Masao.
-
- 30 1月, 2010 1 次提交
-
-
由 Simon Riggs 提交于
records for heap and btree. Minor change, mostly API changes to pass through the required values. This is a simple change though also provides the refactoring required for further enhancements to conflict processing using the relOid. Changes only have effect during Hot Standby.
-
- 21 1月, 2010 1 次提交
-
-
由 Heikki Linnakangas 提交于
that would've been WAL-logged if archiving was enabled. If we encounter such records in archive recovery anyway, we know that some data is missing from the log. A WARNING is emitted in that case. Original patch by Fujii Masao, with changes by me.
-
- 14 1月, 2010 1 次提交
-
-
由 Simon Riggs 提交于
of this are to centralise the conflict code to allow further change, as well as to allow passing through the full reason for the conflict through to the conflicting backends. Backend state alters how we can handle different types of conflict so this is now required. As originally suggested by Heikki, no longer optional.
-
- 10 1月, 2010 1 次提交
-
-
由 Robert Haas 提交于
Previously, fastgetattr() and heap_getattr() tested their fourth argument against a null pointer, but any attempt to use them with a literal-NULL fourth argument evaluated to *(void *)0, resulting in a compiler error. Remove these NULL tests to avoid leading future readers of this code to believe that this has a chance of working. Also clean up related legacy code in nocachegetattr(), heap_getsysattr(), and nocache_index_getattr(). The new coding standard is that any code which calls a getattr-type function or macro which takes an isnull argument MUST pass a valid boolean pointer. Per discussion with Bruce Momjian, Tom Lane, Alvaro Herrera.
-
- 03 1月, 2010 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 19 12月, 2009 1 次提交
-
-
由 Simon Riggs 提交于
Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
-
- 24 8月, 2009 1 次提交
-
-
由 Tom Lane 提交于
"all tuples visible" flag in heap page headers. The flag update *must* be applied before calling XLogInsert, but heap_update and the tuple moving routines in VACUUM FULL were ignoring this rule. A crash and replay could therefore leave the flag incorrectly set, causing rows to appear visible in seqscans when they should not be. This might explain recent reports of data corruption from Jeff Ross and others. In passing, do a bit of editorialization on comments in visibilitymap.c.
-
- 11 6月, 2009 2 次提交
-
-
由 Bruce Momjian 提交于
provided by Andrew.
-
由 Tom Lane 提交于
node starts from the same place as the first scan did. This avoids surprising behavior of scrollable and WITH HOLD cursors, as seen in Mark Kirkwood's bug report of yesterday. It's not entirely clear whether a rescan should be forced to drop out of the syncscan mode, but for the moment I left the code behaving the same on that point. Any change there would only be a performance and not a correctness issue, anyway. Back-patch to 8.3, since the unstable behavior was created by the syncscan patch.
-
- 13 5月, 2009 1 次提交
-
-
由 Tom Lane 提交于
errors when tables are concurrently dropped. To do this we must take lock on each relation before we check its privileges. The old code was trying to do that the other way around, which is a bit pointless when there are lots of other commands that lock relations before checking privileges. I did keep it checking each relation's privilege before locking the next relation, which is a detail that ALTER TABLE isn't too picky about.
-
- 21 1月, 2009 1 次提交
-
-
由 Heikki Linnakangas 提交于
be used instead of the normal exclusive lock, and make WAL redo functions responsible for calling RestoreBkpBlocks(). They know better what kind of a lock they need. At the moment, this just moves things around with no functional change, but makes the hot standby patch that's under review cleaner.
-
- 02 1月, 2009 1 次提交
-
-
由 Bruce Momjian 提交于
-
- 17 12月, 2008 1 次提交
-
-
由 Tom Lane 提交于
the other major heapam.c functions. The only known consequence of this omission is that UPDATE RETURNING failed to return the correct value for "tableoid", as per report from KaiGai Kohei. Back-patch to 8.2. Arguably it's wrong all the way back; but without evidence of visible breakage before RETURNING was added, I'll desist from patching the older branches.
-
- 03 12月, 2008 1 次提交
-
-
由 Heikki Linnakangas 提交于
heap page, where a set bit indicates that all tuples on the page are visible to all transactions, and the page therefore doesn't need vacuuming. It is stored in a new relation fork. Lazy vacuum uses the visibility map to skip pages that don't need vacuuming. Vacuum is also responsible for setting the bits in the map. In the future, this can hopefully be used to implement index-only-scans, but we can't currently guarantee that the visibility map is always 100% up-to-date. In addition to the visibility map, there's a new PD_ALL_VISIBLE flag on each heap page, also indicating that all tuples on the page are visible to all transactions. It's important that this flag is kept up-to-date. It is also used to skip visibility tests in sequential scans, which gives a small performance gain on seqscans.
-
- 19 11月, 2008 1 次提交
-
-
由 Heikki Linnakangas 提交于
truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To make that cleaner from modularity point of view, move the WAL-logging one level up to RelationTruncate, and move RelationTruncate and all the related WAL-logging to new src/backend/catalog/storage.c file. Introduce new RelationCreateStorage and RelationDropStorage functions that are used instead of calling smgrcreate/smgrscheduleunlink directly. Move the pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new functions. This leaves smgr.c as a thin wrapper around md.c; all the transactional stuff is now in storage.c. This will make it easier to add new forks with similar truncation logic, like the visibility map.
-
- 07 11月, 2008 1 次提交
-
-
由 Tom Lane 提交于
(but not locked, as that would risk deadlocks). Also, make it work in a small ring of buffers to avoid having bulk inserts trash the whole buffer arena. Robert Haas, after an idea of Simon Riggs'.
-