提交 9ced0134 编写于 作者: S Simon Riggs

Avoid early reuse of btree pages, causing incorrect query results.

When we allowed read-only transactions to skip assigning XIDs
we introduced the possibility that a fully deleted btree page
could be reused. This broke the index link sequence which could
then lead to indexscans silently returning fewer rows than would
have been correct. The actual incidence of silent errors from
this is thought to be very low because of the exact workload
required and locking pre-conditions. Fix is to remove pages only
if index page opaque->btpo.xact precedes RecentGlobalXmin.

Noah Misch, reviewed and backpatched by Simon Riggs
上级 485e12fb
...@@ -258,13 +258,15 @@ we need to be sure we don't miss or re-scan any items. ...@@ -258,13 +258,15 @@ we need to be sure we don't miss or re-scan any items.
A deleted page can only be reclaimed once there is no scan or search that A deleted page can only be reclaimed once there is no scan or search that
has a reference to it; until then, it must stay in place with its has a reference to it; until then, it must stay in place with its
right-link undisturbed. We implement this by waiting until all right-link undisturbed. We implement this by waiting until all active
transactions that were running at the time of deletion are dead; which is snapshots and registered snapshots as of the deletion are gone; which is
overly strong, but is simple to implement within Postgres. When marked overly strong, but is simple to implement within Postgres. When marked
dead, a deleted page is labeled with the next-transaction counter value. dead, a deleted page is labeled with the next-transaction counter value.
VACUUM can reclaim the page for re-use when this transaction number is VACUUM can reclaim the page for re-use when this transaction number is
older than the oldest open transaction. (NOTE: VACUUM FULL can reclaim older than RecentGlobalXmin. As collateral damage, this implementation
such pages immediately.) also waits for running XIDs with no snapshots and for snapshots taken
until the next transaction to allocate an XID commits.
(NOTE: VACUUM FULL can reclaim such pages immediately.)
Reclaiming a page doesn't actually change its state on disk --- we simply Reclaiming a page doesn't actually change its state on disk --- we simply
record it in the shared-memory free space map, from which it will be record it in the shared-memory free space map, from which it will be
......
...@@ -633,7 +633,7 @@ _bt_page_recyclable(Page page) ...@@ -633,7 +633,7 @@ _bt_page_recyclable(Page page)
*/ */
opaque = (BTPageOpaque) PageGetSpecialPointer(page); opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISDELETED(opaque) && if (P_ISDELETED(opaque) &&
TransactionIdPrecedesOrEquals(opaque->btpo.xact, RecentXmin)) TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin))
return true; return true;
return false; return false;
} }
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册