btrfs: release upper nodes when reading stale btree node from disk

When reading a btree node (or leaf), at read_block_for_search(), if we can't find its extent buffer in the cache (the fs_info->buffer_radix radix tree), then we unlock all upper level nodes before reading the btree node/leaf from disk, to prevent blocking other tasks for too long. However if we find that the extent buffer is in the cache but it is not up to date, we don't unlock upper level nodes before reading it from disk, potentially blocking other tasks on upper level nodes for too long. Fix this inconsistent behaviour by unlocking upper level nodes if we need to read a node/leaf from disk because its in-memory extent buffer is not up to date. If we unlocked upper level nodes then we must return -EAGAIN to the caller, just like the case where the extent buffer is not cached in memory. And like that case, we determine if upper level nodes are locked by checking only if the parent node is locked - if it isn't, then no other upper level nodes are locked. This is actually a rare case, as if we have an extent buffer in memory, it typically has the uptodate flag set and passes all the checks done by btrfs_buffer_uptodate(). Reviewed-by: N Josef Bacik <josef@toxicpanda.com> Signed-off-by: N Filipe Manana <fdmanana@suse.com> Reviewed-by: N David Sterba <dsterba@suse.com> Signed-off-by: N David Sterba <dsterba@suse.com>

btrfs: release upper nodes when reading stale btree node from disk
When reading a btree node (or leaf), at read_block_for_search(), if we can't find its extent buffer in the cache (the fs_info->buffer_radix radix tree), then we unlock all upper level nodes before reading the btree node/leaf from disk, to prevent blocking other tasks for too long. However if we find that the extent buffer is in the cache but it is not up to date, we don't unlock upper level nodes before reading it from disk, potentially blocking other tasks on upper level nodes for too long. Fix this inconsistent behaviour by unlocking upper level nodes if we need to read a node/leaf from disk because its in-memory extent buffer is not up to date. If we unlocked upper level nodes then we must return -EAGAIN to the caller, just like the case where the extent buffer is not cached in memory. And like that case, we determine if upper level nodes are locked by checking only if the parent node is locked - if it isn't, then no other upper level nodes are locked. This is actually a rare case, as if we have an extent buffer in memory, it typically has the uptodate flag set and passes all the checks done by btrfs_buffer_uptodate(). Reviewed-by: N Josef Bacik <josef@toxicpanda.com> Signed-off-by: N Filipe Manana <fdmanana@suse.com> Reviewed-by: N David Sterba <dsterba@suse.com> Signed-off-by: N David Sterba <dsterba@suse.com>
b246666e · Filipe Manana · David Sterba · 4bb59055 · b246666e
隐藏空白更改
内联并排

Showing with 19 addition and 9 deletion

fs/btrfs/ctree.c fs/btrfs/ctree.c +19 -9

未找到文件。
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1409,12 +1409,21 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
 	struct btrfs_key first_key;
 	int ret;
 	int parent_level;
+	bool unlock_up;

+	unlock_up = ((level + 1 < BTRFS_MAX_LEVEL) && p->locks[level + 1]);
 	blocknr = btrfs_node_blockptr(*eb_ret, slot);
 	gen = btrfs_node_ptr_generation(*eb_ret, slot);
 	parent_level = btrfs_header_level(*eb_ret);
 	btrfs_node_key_to_cpu(*eb_ret, &first_key, slot);

+	/*
+	 * If we need to read an extent buffer from disk and we are holding locks
+	 * on upper level nodes, we unlock all the upper nodes before reading the
+	 * extent buffer, and then return -EAGAIN to the caller as it needs to
+	 * restart the search. We don't release the lock on the current level
+	 * because we need to walk this node to figure out which blocks to read.
+	 */
 	tmp = find_extent_buffer(fs_info, blocknr);
 	if (tmp) {
 		if (p->reada == READA_FORWARD_ALWAYS)
@@ -1436,6 +1445,9 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
 			return 0;
 		}

+		if (unlock_up)
+			btrfs_unlock_up_safe(p, level + 1);
+
 		/* now we're allowed to do a blocking uptodate check */
 		ret = btrfs_read_buffer(tmp, gen, parent_level - 1, &first_key);
 		if (ret) {
@@ -1443,17 +1455,14 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
 			btrfs_release_path(p);
 			return -EIO;
 		}
-		*eb_ret = tmp;
-		return 0;
+
+		if (unlock_up)
+			ret = -EAGAIN;
+
+		goto out;
 	}

-	if ((level + 1 < BTRFS_MAX_LEVEL) && p->locks[level + 1]) {
-		/*
-		 * Reduce lock contention at high levels of the btree by
-		 * dropping locks before we read.  Don't release the lock
-		 * on the current level because we need to walk this node
-		 * to figure out which blocks to read.
-		 */
+	if (unlock_up) {
 		btrfs_unlock_up_safe(p, level + 1);
 		ret = -EAGAIN;
 	} else {
@@ -1478,6 +1487,7 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
 	if (!extent_buffer_uptodate(tmp))
 		ret = -EIO;

+out:
 	if (ret == 0) {
 		*eb_ret = tmp;
 	} else {